You're welcome! Mike
http://blog.mikemccandless.com On Sun, May 22, 2011 at 9:20 AM, zhoucheng2008 <zhoucheng2...@gmail.com> wrote: > Great, thanks Mike. > > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Sunday, May 22, 2011 8:09 PM > To: java-user@lucene.apache.org > Subject: Re: How to create document objects in our case > > Norms is how Lucene records what the apriori boost is for each > docXfield. This boost is the product of per-field boost, per-doc > boost (both of which your app would set when it creates the doc), as > well as the "length normalization" Lucene's default similarity applies > (shorter docs have higher boost). > > It's a quantized float, encoded as 8 bits. > > If your fields tend not to vary in length much, and you don't use > boosting yourself, and you are worried about RAM, you should omit the > norms. (Field has a method to omit norms). > > Not sure when nested docs will be available but I hope soon... even > so, once it's committed, it won't be released until the next release > (though you can use it on the trunk/3.x tip as well, once it's > committed). > > Mike > > http://blog.mikemccandless.com > > On Sun, May 22, 2011 at 7:49 AM, zhoucheng2008 <zhoucheng2...@gmail.com> > wrote: >> Mike, thanks for reply. >> >> Can you please elaborate a little bit more on " If you don't need norms >> (don't boost, lengths don't vary much or you >> don't care to have field length impact scoring) you can omit norms"? >> >> When do you expect the handling of nested document will be applicable? >> >> Cheng >> >> >> -----Original Message----- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Sunday, May 22, 2011 6:58 PM >> To: java-user@lucene.apache.org >> Subject: Re: How to create document objects in our case >> >> 30 fields is fine, but if they are all indexed you should watch out >> for memory usage. Ie, norms require 1 byte per doc per indexed field. >> If you don't need norms (don't boost, lengths don't vary much or you >> don't care to have field length impact scoring) you can omit norms. >> >> The relationship b/w Group and Subgroup is not something Lucene can do >> today -- all docs are "independent" in a Lucene index. >> >> So.. you can either "denormalize", meaning duplicate all Group fields >> onto each subgroup, or you can check out >> https://issues.apache.org/jira/browse/LUCENE-2454 which I think adds >> exactly what you need. It's not yet committed, but we are finally >> starting to make progress towards that, eg >> https://issues.apache.org/jira/browse/LUCENE-3112 >> >> Mike >> >> http://blog.mikemccandless.com >> >> On Fri, May 20, 2011 at 8:27 PM, Cheng Zhou <zhoucheng2...@gmail.com> > wrote: >>> Hi, >>> >>> I have a large number of XML files to be indexed by Lucene. All the files >>> share similar structure as below: >>> >>> <Group id="abc" member="cde" blah blah ....> >>> <Subgroup id="abc1" member ="fgh" blah blah ...> >>> <Subgroup id="abc2" member ="fgh" blah blah ...> >>> <Subgroup id="abc3" member ="fgh" blah blah ...> >>> ...... >>> </Group> >>> >>> Things to be noted are: >>> >>> The root element of Group has 30 or so attributes, and it usually has > over >>> 2000 Subgroup elements, which in turn also have more than 20 attributes. >>> >>> I want to create one Document object which holds the contents of the > Group >>> element, and one Document object which holds all the Subgroup elements. >>> >>> Here are my challenges however: >>> >>> 1. How many fields are advised for a Document to be indexed by Lucene? >> Will >>> over 30 fields (for the Group element) be too many? >>> >>> 2. How to create a Document object and fields for holding all the > Subgroup >>> elements? Is this a good way to think of? >>> >>> 3. How can I link the Document object of the Group element to the > Document >>> object of all the Subgroup elements? >>> >>> Please note that I intend to use such two Document objects to achieve the >>> group while I don't know whether it is a good solution or not. I am open >> to >>> using more than two Documents to do the job, but I don't know how to >> connect >>> all the objects in Lucene. >>> >>> Many thanks! >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org