This is my first message to the list. Nice to meet you all. (I've joined this list because I'm a 10 old Sourceforge user and I'm looking forward to contribute on the development of Allura)

I'm an experienced Web scrapping programmer, so I think I can team with you Davide on the task of collecting data from other sites.

I've written my own crawlers but also I'm familiar with scrapy ( http://scrapy.org/), a Python crawling/scrapping solution very well made and easy to use.

Please let me know if you need a hand on this.

On 01/15/2013 07:45 PM, Rich Bowen wrote:
On Jan 15, 2013, at 12:31 PM, Davide Galletti wrote:

Hi everybody,

my name is Davide Galletti and I am working on a EEC funded research project 
named MARKOS;
Welcome!
Within MARKOS I will realize a component called "Crawler" which will be 
responsilble for
gathering as much informationon OSS Projects as possible from forges, 
metaforges and any
source we might find interesting. The first release is expected in 2013, and 
development will
continue till the end of 2014, of course with an OSS license.
We expect to contribute also in other directions; for instance we might 
consider helping
out Apache on the maintenance of the DOAP files of their projects.
I'm very interested in this. In fact, just last week I was looking at the DOAP 
listing 
(https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/files.xml)
 and noticed that numerous projects are missing. We'd love to see that list be 
complete.

I would be happy the Crawler component could become useful to the Allura 
Platform; the benefit
would be that the user searching on Allura would find also projects hosted 
elsewhere; within
Allura there could be a detail page on the project from which the user could 
eventually jump
to the external project or download pages.

If this makes sense, I hope that you will keep an eye on this project and maybe 
also give me
some hints,

We'd love to see more of your ideas in this direction. I'm more on the 
community side than technical, so I'll leave it to others to give you specific 
technical direction, but mostly, we'd love to see you jump in and make things 
happen.



Reply via email to