Re: Single string automaton causes NPE on Terms.intersect( CompiledAutomaton, BytesRef term )

Vishwas Jain Sat, 26 Mar 2016 13:04:08 -0700

Hello ,
          We are trying to implement better compression techniques in
lucene54 codec of Apache Lucene. Currently there is no such compression for
posting lists in lucene54 codec but LZ4 compression technique is used for
stored fields. Does anyone know why there is no compression technique for
postings lists? and what are the possible compression that would benefit if
implemented?


Thanks

On Sat, Mar 26, 2016 at 5:24 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hi José,
>
> Can you please open a Jira issue about this, and add a test case as a
> patch, if you can?  I think it's bad you hit an NPE!  Not sure how
> best to fix it, but we can iterate on the issue.
>
> Thanks!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Mar 25, 2016 at 7:11 PM, José Tomás Atria <jtat...@gmail.com>
> wrote:
> > Ok, digging a little more, I found that the problem mentioned above seems
> > to be caused by FieldReader overriding the intersect( CompiledAutomaton,
> > BytesRef )
> > <
> https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/Terms.html#intersect(org.apache.lucene.util.automaton.CompiledAutomaton,%20org.apache.lucene.util.BytesRef)
> >
> > method
> > in Terms.
> >
> > The overriden method checks to see if the compiled automaton is not
> > AUTOMATON_TYPE.NORMAL, and if it isn't, throws an
> IllegalArgumentException
> > and instructs one to use CompiledAutomaton.getTermsEnum( Terms ) instead:
> >     if (compiled.type != CompiledAutomaton.AUTOMATON_TYPE.NORMAL) {
> >       throw new IllegalArgumentException("please use
> > CompiledAutomaton.getTermsEnum instead");
> >     }
> >
> > which, of course, works perfectly, so I'm doing that now and the problem
> is
> > no more.
> >
> > However, the method in FieldReader just assumes that the compiled
> automaton
> > is AUTOMATON_TYPE.NORMAL, which causes the above NPE, because the
> > runAutomaton of a non-normal CompiledAutomaton is set to null in the
> > constructor, lines 191 to 209:
> >
> > IntsRef singleton = Operations.getSingleton(automaton);
> >
> > if (singleton != null) {
> >   // matches a fixed string
> >   type = AUTOMATON_TYPE.SINGLE;
> >   commonSuffixRef = null;
> >   runAutomaton = null; // <- HERE!
> >   this.automaton = null;
> >   this.finite = null;
> >
> >   if (isBinary) {
> >     term = StringHelper.intsRefToBytesRef(singleton);
> >   } else {
> >     term = new BytesRef(UnicodeUtil.newString(singleton.ints,
> > singleton.offset, singleton.length));
> >   }
> >   sinkState = -1;
> >   return;
> > }
> >
> > Not to pretend I have any idea of what I'm talking about, but given that
> > the user has relatively little control on which implementation of Terms
> we
> > get at runtime (this user at least), shouldn't the overriding method in
> > FieldReader also check the AUTOMATON_TYPE and throw an equally
> informative
> > IllegalArgumentException instead, just for the sake of consistency?
> >
> > Sorry if all of the above is a little off topic for this list :)
> >
> > Best,
> > jta
> >
> >
> > On Fri, Mar 25, 2016 at 4:33 PM, José Tomás Atria <jtat...@gmail.com>
> wrote:
> >
> >> Hello again!
> >>
> >> I'm playing around some more with Lucene's automata, and I've bumped
> into
> >> something unexpected but can't figure out if its a bug or an error on my
> >> part.
> >>
> >> briefly: Is it possible to use a single string automaton (i.e. the
> result
> >> of Automata.makeString( String ) )  to intersect a Terms instance? I
> keep
> >> getting NPE's on every attempt at doing this... e.g. this code:
> >>
> >> // where "term" is a term known to exist in someField
> >> CompiledAutomaton ca = new CompiledAutomaton( Automata.makeString(
> "term"
> >> ) );
> >> Terms terms = leafReader.terms( someField );
> >> TermsEnum tEnum = terms.intersect( ca, null );
> >>
> >> results in:
> >> Exception in thread "main" java.lang.NullPointerException
> >> at
> >>
> org.apache.lucene.codecs.blocktree.IntersectTermsEnum.<init>(IntersectTermsEnum.java:127)
> >> at
> >>
> org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:185)
> >>
> >> I assume I'm doing something wrong (I am aware that using an automaton
> for
> >> a single term may be a bad idea, but bear with me), but the fact that
> it's
> >> throwing an NPE prompted me to come and ask...
> >>
> >> Maybe there's a problem with encodings?
> >>
> >> Any help greatly appreciated.
> >> jta.
> >>
> >> --
> >> entia non sunt multiplicanda praeter necessitatem
> >>
> >
> >
> >
> > --
> > entia non sunt multiplicanda praeter necessitatem
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Single string automaton causes NPE on Terms.intersect( CompiledAutomaton, BytesRef term )

Reply via email to