This is my first message to the list. Nice to meet you all. (I've joined
this list because I'm a 10 old Sourceforge user and I'm looking forward
to contribute on the development of Allura)
I'm an experienced Web scrapping programmer, so I think I can team with
you Davide on the task of collecting data from other sites.
I've written my own crawlers but also I'm familiar with scrapy (
http://scrapy.org/), a Python crawling/scrapping solution very well made
and easy to use.
Please let me know if you need a hand on this.
On 01/15/2013 07:45 PM, Rich Bowen wrote:
On Jan 15, 2013, at 12:31 PM, Davide Galletti wrote:
Hi everybody,
my name is Davide Galletti and I am working on a EEC funded research project
named MARKOS;
Welcome!
Within MARKOS I will realize a component called "Crawler" which will be
responsilble for
gathering as much informationon OSS Projects as possible from forges,
metaforges and any
source we might find interesting. The first release is expected in 2013, and
development will
continue till the end of 2014, of course with an OSS license.
We expect to contribute also in other directions; for instance we might
consider helping
out Apache on the maintenance of the DOAP files of their projects.
I'm very interested in this. In fact, just last week I was looking at the DOAP
listing
(https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/files.xml)
and noticed that numerous projects are missing. We'd love to see that list be
complete.
I would be happy the Crawler component could become useful to the Allura
Platform; the benefit
would be that the user searching on Allura would find also projects hosted
elsewhere; within
Allura there could be a detail page on the project from which the user could
eventually jump
to the external project or download pages.
If this makes sense, I hope that you will keep an eye on this project and maybe
also give me
some hints,
We'd love to see more of your ideas in this direction. I'm more on the
community side than technical, so I'll leave it to others to give you specific
technical direction, but mostly, we'd love to see you jump in and make things
happen.