Hi Wojtek, Some comments inline:
> Found elements would be stored with current document data. I don't see any reason for storing the entire document content in the index, why would you do that? Couldn't you just store some URI that would point to the original file? *> All* | All above fields to provide non-filter queries. You don't need to do it, if you do, you will basically duplicate the posting list size in the index. To reproduce what you want, the field "all" should be available only for querying purposes, for example, the user could type 'all:store' in the query, and before processing the query it could be expanded to all the "searchable" fields: component:store contribution:store...etc. It's a common practice on unstructured data world, it's so common that Lucene has a query parser for that called MultiFieldQueryParser : )...I think you already said it here: > none (all document fields would be used to search) > regular expressions Can you give me some example of regular expressions? I liked your presentation idea : ) Architectural outline session is also very complete : ) Good luck ; ) Adriano Crestani On Thu, Apr 2, 2009 at 3:24 PM, Wojtek Janiszewski < [email protected]> wrote: > Hi, Adriano. > > Thanks for input. I've included your comments in updated proposal [1]. > (previous timeline was only pattern and I was going to update it later:)). > > Thanks, > Wojtek > > [1] - > http://cwiki.apache.org/confluence/display/TUSCANYWIKI/Searching+artifacts+across+SCA+domain > > Adriano Crestani pisze: > > Hi Wojtek, >> >> nice proposal : ) >> >> Indexing should include all available contributions. File names as well as >> their contents (except non readable files like Java classes) should be >> indexed. Every indexed item should have link to its contribution parent. >> >> I agree about a link to contributions...actually, if you make the >> contributions the main search target, I mean, if the contribution will be >> what the user would want as the results, every indexed term would point to a >> contribution, so it already has a link to the contribution : ) . I only >> disagree when you say that Java classes are non-readable, they are readable, >> they have class/method/variables/annotation names, even a .zip is readable, >> you could open it and index the name of the files contained in it, as well >> as the contents of this files, if readable. >> >> - Maybe we should consider candies like Ajax hints while typing search >> phrase? >> >> I would be reeeeally cool : ), but not priority. It could be easily added >> later after everything else is working : ) >> >> -- simply search for files by name >> >> I would recommend to index file names using an specific Lucene field for >> that, like "filename", so the query could be >> filename:(contributionname.composite)...otherwise, if the user types only >> contributionname.composite, it could look for this text in every field >> contained in the index, Lucene has a special feature for that, so it's easy >> to be implemented. Associating terms with a field is always good for >> fieltering :) >> >> Proposal: >> >> > preview link (if item is readable) >> >> If the item is not readable, a link could also be provided for downloading >> : ) >> >> Could you please provide to us a more detailed timeline? >> >> I think you should add more detailed about how the text will be parsed >> and indexed. The way you do this is very important because it implies in how >> the documents/contributions/artifacts can be searched and what kind o >> results can be provide to the user. >> >> Best Regards, >> Adriano Crestani >> >
