Re: [Fwd: Re: 2.9, 3.0 and deprecation]

patrick o'leary Mon, 15 Dec 2008 12:14:41 -0800

Hey Jason

o.a.l.s.trie looks interesting and has a lot of potential, locallucene 1.5+ release moved to a Cartesian tier system away from
the boundary box filter a while though.

A TierRange or RangeFilter as the one I used in v1.0 was a little inefficient as you have to do a bit AND on 2 range look ups
e.g.

RangeFilter(min-latitude, max-latitude) AND RangeFilter(min-longitude, max-longitude)
(I extended the Filter class with an ISerialChainFilter to improve performance)

The 1.5+ version of locallucene does it differently, where I pre-generate the bounding shape's Cartesian id's, so all the boxes that make up
the overall bounding box, and simply pull the matching doc id's out of the TermEnumerator.

Take a look at the CartesianShapeFilter http://locallucene.svn.sourceforge.net/viewvc/locallucene/trunk/locallucene/src/java/com/pjaol/search/geo/utils/CartesianShapeFilter.java?revision=66&view=markup

This gives you a bounding box lookup of about 3 - 4 ms on a 3 million doc index.

Thanks
Patrick

Sean Timm wrote:

Subject:
Re: 2.9, 3.0 and deprecation

From:
"Jason Rutherglen" <[email protected]>

Date:
Mon, 15 Dec 2008 12:29:38 -0500

To:
<[email protected]>

To:
<[email protected]>

About LocalLucene, it would benefit (be faster) by integrating with TrieRangeQuery for the bounding box filter.

On Sun, Dec 14, 2008 at 3:54 AM, Michael McCandless <[email protected]> wrote:

I'd also personally like to see 2.9 released sooner rather than later,
maybe earliesh next year?

I don't think we should hold up 2.9 for some of the big items below
(eg Fieldable/AbstractField/Field cleanup), especially if they have
not even been started yet.

One question: I'm assuming after 2.9 is out, we would fairly quickly
follow that up with a 3.0 that has more or less just removed deprecations?
(Vs doing alot of dev putting new features into 3.0 as well).

More comments below:

Grant Ingersoll wrote:

1. Splitting Index time Document from Search time Document per Hoss' ideas on a variety of threads in the past. Something to the gist of having an InputDocument and an OutputDocument (and maybe an abstract Document for shared features) such that people wouldn't be confused about calling index time things on Document during search and vice versa.

Maybe don't hold 2.9 for this one? (There's been lots of discussion, and also recently interesting discussion on adding type safety to Document under LUCENE-831, but nothing yet concrete).

2. Java 1.5 (who knows, maybe by 2020, we can be on 1.6!). This means we can use Generics, or as I like to call them "Specifics" since the specifically say what is in the collection as opposed to the current collections where you can put any generic object in them. :-)

We get this one "for free" ;)

3. Michael B. is proposing a new Token API (but it's back compat.)

Already in and quite a big cutover.

4. Mike M. is doing some new flex indexing stuff (but it's back compat.)

I would not hold 3.0 for this; it's still big & exploratory at this point.

5. Is there anything we would need to deprecate now if we were to take advantage of 1.5 concurrent packages?

Has anyone looked into this?

6. Local Lucene is of interest to a lot of people. Does it require anything special in terms of deprecation? (me thinks not)

Any more clarity on this? I would also assume not.

7. Same goes for the real time stuff, PFOR implementation, column-stride fields, etc.

I think we tackle real-time needs one by one (eg LUCENE-1484, removing sync on IndexReader.document(), which I'm working on now, doesn't deprecate anything). PFOR I think is blocked by flex indexing. Column-stride fields would be great to get it, but doesn't seem to have any forward motion for quite a while...

8. I think we should do a review of what's open in JIRA again and see if we can come to conclusions on any of them, such that going into 3.0, JIRA is relatively clean.

I've been trying over time to mark things as fix version 2.9, so we are at least forced to review them come 2.9.

9. For 3.0, what cruft from 1.x can we remove from the file format, since 3.x need not read 1.x format _if_ doing so is advantageous to us?

I'm not sure offhand. There's alot of scary cruft in SegmentInfo.files(), but it's from 2.0 -> 2.1 so we need to keep it for now (to remove in 4.0, in 2020 ;) ).

10. There has been some talk about changing how StandardTokenizer labels some tokens. What can we do in there to deprecate?

I think we need a more incremental approach, somehow, for StandardTokenizer. Like it does its own internal versioning or something. There have been lots of little cases over time where it needs fixing, yet, it would be a break in back compat to fix them.

11. Fieldable. Ah, Fieldable. I believe this is going to become an abstract base class, or go away.

This is a biggie and nobody's stepped up so far to tackle it... I would say don't hold up 2.9 for this.

Maybe add these ones:

12. LUCENE-1483 -- running Scorer & HitCollector "per segment". We are making good progress here, and uncovering some nice per-query performance wins plus much faster searcher warming (sicne FieldCache is only used per-segment). On the current path it looks likely to deprecate current Field sorting classes, so it'd be great to get this in before 2.9.

13. LUCENE-831 (new FieldCache API). This is long standing and there's a fair amount of interest, and through our iterations with LUCENE-1483 (one of the primary users of the FieldCache API, field sorting) we are getting more clarity on what a new FieldCache API should look like. It'd be nice to resolve before 2.9, and I'd like to spend time doing so (after / with LUCENE-1483).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763


You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View Patrick O Leary's profile

Re: [Fwd: Re: 2.9, 3.0 and deprecation]

Reply via email to