Hi Davide and Santiago, 

Welcome to the list. The MARKOS/Crawler sounds interesting.

To integrate it into Allura, you would most likely want to make it a separate 
tool. Allura is pluggable - it's capabilities can be augmented by external 
tools. Our documentation on creating a tool from scratch is not great (maybe 
not existent yet?), but for a simple example see 
http://sourceforge.net/p/forgepastebin.

This is a tool that is separate from the Allura codebase, but which we use in 
the SourceForge instance of Allura. It's simple enough that you can probably 
see how to write your own tool just by reading through the code.

If you have any questions as you go, don't hesitate to ask here! 

-- 
Tim Van Steenburgh


On Tuesday, January 15, 2013 at 3:43 PM, Santiago Lizardo wrote:

> This is my first message to the list. Nice to meet you all. (I've joined 
> this list because I'm a 10 old Sourceforge user and I'm looking forward 
> to contribute on the development of Allura)
> 
> I'm an experienced Web scrapping programmer, so I think I can team with 
> you Davide on the task of collecting data from other sites.
> 
> I've written my own crawlers but also I'm familiar with scrapy ( 
> http://scrapy.org/), a Python crawling/scrapping solution very well made 
> and easy to use.
> 
> Please let me know if you need a hand on this.
> 
> On 01/15/2013 07:45 PM, Rich Bowen wrote:
> > On Jan 15, 2013, at 12:31 PM, Davide Galletti wrote:
> > 
> > > Hi everybody,
> > > 
> > > my name is Davide Galletti and I am working on a EEC funded research 
> > > project named MARKOS;
> > Welcome!
> > > Within MARKOS I will realize a component called "Crawler" which will be 
> > > responsilble for
> > > gathering as much informationon OSS Projects as possible from forges, 
> > > metaforges and any
> > > source we might find interesting. The first release is expected in 2013, 
> > > and development will
> > > continue till the end of 2014, of course with an OSS license.
> > > We expect to contribute also in other directions; for instance we might 
> > > consider helping
> > > out Apache on the maintenance of the DOAP files of their projects.
> > > 
> > 
> > I'm very interested in this. In fact, just last week I was looking at the 
> > DOAP listing 
> > (https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/files.xml)
> >  and noticed that numerous projects are missing. We'd love to see that list 
> > be complete.
> > 
> > > I would be happy the Crawler component could become useful to the Allura 
> > > Platform; the benefit
> > > would be that the user searching on Allura would find also projects 
> > > hosted elsewhere; within
> > > Allura there could be a detail page on the project from which the user 
> > > could eventually jump
> > > to the external project or download pages.
> > > 
> > > If this makes sense, I hope that you will keep an eye on this project and 
> > > maybe also give me
> > > some hints,
> > > 
> > 
> > 
> > We'd love to see more of your ideas in this direction. I'm more on the 
> > community side than technical, so I'll leave it to others to give you 
> > specific technical direction, but mostly, we'd love to see you jump in and 
> > make things happen. 

Reply via email to