On Wed, May 22, 2013 at 11:28 AM, Brendan Grainger <brendan.grain...@gmail.com> wrote: > Hi All, > > Sorry if this is a stupid question, but I'm still catching up with some of > the new APIs and I want to make sure my assumptions are correct. > > Anyway, I'm the solr PathHierachyTokenizer to create a number of paths, > e.g. for a book object say with a category field of /compsci/search/lucene > the PathHierachyTokenizer creates the following tokens and they are added > to a multivalued field called 'categories' > > /compsci > /compsci/search > /compsci/search/lucene > > I then want to iterate over these categories using a TermsEnum. This is the > relevant code: > > Terms terms = fields.terms('categories'); > if (terms == null) return null; > TermsEnum termsEnum = terms.iterator(null); > > BytesRef text; > while((text = termsEnum.next()) != null) { > System.out.println("field=categories; text=" + text.utf8ToString()); > > > My question is, is it guaranteed that the order of the terms as they're > enumerated will be > > /compsci > /compsci/search > /compsci/search/lucene > > and if in another document I added /compsci/graphics/3d then the terms as > I enumerate them would be: > > /compsci > /compsci/graphics > /compsci/graphics/3d > /compsci/search > /compsci/search/lucene
The short answer is "yes". Longer answer: terms are sorted according to the codec's TermsConsumer.getComparator(), but all codecs I know of just use Unicode comparator (BytesRef.getUTF8SortedAsUnicodeComparator). Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org