Hello, This is a great start! I am interested in helping with the development of a crowd sourcing application. The next step would be creating a set of requirements for the web app. Would the ORP wiki be a good place to store the requirements?
--Dan On Tue, Sep 14, 2010 at 9:51 AM, Grant Ingersoll <[email protected]>wrote: > I think the biggest hurdle we have in front of us is curating a data set > that we can redistribute. I'm in the process of uploading all the ASF > public mail archives as of Sept. 13 to Amazon S3. I also have some tools > (thanks to Chris Rhodes) for processing this into Solr XML. I think this > would give us a standard corpus to start with and would fairly well mimic > some enterprise search/eDiscovery tasks pretty well. > > At any rate, as with any community, the proof is in people stepping up to > help out. I like that so many people suggested we keep going. As for what > to do, I think the options are pretty wide open and there is opportunity for > people to define the project w/o any previous encumbrances. > > Some ideas that have been kicked around in the past: > 1. Creative-commons data set, judgments, queries > 2. Open Street Map (spatial search) > 3. Mail archives > 4. A crowd sourcing application. Given a set of documents and queries, > have people provide judgments. Ideally, this runs in a web container and we > could probably even find resources to host it here. Combining that with one > of the items above, we would be on our way. App could also solicit queries > by providing users open search box and opportunities to browse the data. > > I know much of this is simplistic, but it is a start. > > -Grant > > > On Sep 13, 2010, at 9:04 PM, Dan Cardin wrote: > > > Hello, > > > > I am new to ORP. I would like to contribute to the project. I do not have > a > > lot of experience in this field of IR, crowd sourcing or AI. If someone > > could take the lead and set forward path I would be willing to contribute > my > > skill set to ORP. > > > > How can I help? I have a lot of experience doing software development and > > system administration. > > > > Cheers, > > --Dan > > > > On Mon, Sep 13, 2010 at 1:36 PM, Omar Alonso <[email protected]> wrote: > > > >> I think ORP is a great candidate for crowdsourcing/human computation. In > >> the last year or so there's been quite a bit of research and > applications on > >> this. See the page for the SIGIR workshop on using crowdsourcing for IR > >> evaluation: > >> http://www.ischool.utexas.edu/~cse2010/<http://www.ischool.utexas.edu/%7Ecse2010/> > <http://www.ischool.utexas.edu/%7Ecse2010/> > >> > >> Omar > >> > >> --- On Mon, 9/13/10, Itamar Syn-Hershko <[email protected]> wrote: > >> > >>> From: Itamar Syn-Hershko <[email protected]> > >>> Subject: Re: Whither ORP? > >>> To: [email protected] > >>> Date: Monday, September 13, 2010, 9:33 AM > >>> With the proper two-way open-source > >>> development process (taking and then giving) I think it can > >>> become an important part of open-IR technologies, just like > >>> what Lucene did to the search engines world. What ORP has to > >>> offer is of great interest to HebMorph, an open-source > >>> project of mine trying to decide on what is the best way to > >>> index and search Hebrew texts. > >>> > >>> To this end I decided to put some of the development > >>> efforts of the HebMorph project into making tools for the > >>> ORP. I have announced this before, but unfortunately I had > >>> to attend to more pressing tasks before I could complete > >>> this (and there was no response from the community > >>> anyway...). Just in case you're interested in seeing what I > >>> came up with so far: http://github.com/synhershko/Orev. > >>> > >>> IMHO, the ORP should stand by itself, and relate to > >>> Lucene/Solr only as its basis framework for these initial > >>> stages. Perhaps also try to attract more people who could > >>> find an interest in what it has to offer, so it can really > >>> start growing. > >>> > >>> Itamar. > >>> > >>> On 12/9/2010 1:29 PM, Grant Ingersoll wrote: > >>>> On Sep 11, 2010, at 8:51 PM, Robert Muir wrote: > >>>> > >>>> > >>>>> i propose we take what we have and import into > >>> lucene-java's benchmark > >>>>> contrib. it already has integration with > >>> wikipedia and reuters for perf > >>>>> purposes, and the quality package is actually > >>> there anyways. later, maybe > >>>>> more people have time and contrib/benchmark > >>> evolves naturally... e.g. to > >>>>> modules/benchmark with solr support as a first big > >>> step. > >>>>> > >>>> Yeah, that seems reasonable. I have been > >>> thinking lately that it might be useful to pull our DocMaker > >>> stuff out separately from benchmark so that people have easy > >>> ways of generating content from things like Wikipedia, etc. > >>>> > >>>> Still, at the end of the day, I like what ORP _could_ > >>> bring to the table and to some extent I think that is lost > >>> by folding it into Lucene benchmark. > >>>> > >>>> > >>>>> On Sep 11, 2010 7:33 PM, "Grant Ingersoll"<[email protected]> > >>> wrote: > >>>>> > >>>>>> Seems ORP isn't really catching on with > >>> people. I know personally I don't > >>>>>> > >>>>> have the time I had hoped to have to get it going. > >>> At the same time, I > >>>>> really think it could be a good project. We've got > >>> some tools put together, > >>>>> but we still haven't done much about the bigger > >>> goal of a "self contained" > >>>>> evaluation. > >>>>> > >>>>>> Any thoughts on how we should proceed with > >>> ORP? > >>>>>> > >>>>>> -Grant > >>>>>> > >>>> > >>>> > >>>> > >> > >> > >> > >> > > -------------------------- > Grant Ingersoll > http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8 > >
