On Aug 10, 2009, at 6:28 PM, Mark Miller wrote:
Grant Ingersoll wrote:
On Aug 10, 2009, at 5:12 PM, Shai Erera wrote:
Maybe we should follow what I seem to read from Earwin and Grant -
come up w/ real use cases, try to implement them w/ the current
API, then if it's impossible, discuss how we can make the current
API more adaptive. If at the end of this we'll get back to the new
API, then we'll at least feel better about it, and more convinced
it is the way to go.
Well, I have real use cases for it, but all of it is still missing
the biggest piece: search side support. It's the 900 lb. elephant
in the room. The 500 lb. elephant is the fact that all these
attributes, AIUI, require you to hook in your own indexing chain,
etc. in order to even be indexed, which is all package private
stuff. It's not even clear to me what happens right now if you
were to, say have a Token Stream that, say, had only one Attribute
on it and none of the existing attributes (term buffer, length,
position, etc.) Please correct me if I am wrong, I still don't
have a deep understanding of it all.
Michael has always been up front that this new API is in preparation
for flexible indexing. It doesn't give us the goodness - he has laid
out the reasons for moving before the goodness comes more than once
I think. From my understanding, Michael looked at what Mike was
doing in one of his flexible indexing patches, wondered how some of
the TokenStream stuff was going to work well with it, and came up
with this new API as a solution. Yes - it gets us nothing now. But
its a big move, and there is no need to do everything at once - in
fact it would probably be harder to do it all at once - the rest has
always been on the table. 3.0 has always been convenient to push it
before, as deprecations can than be removed. Nothing forcing us to
make that decision now though.
Honestly, though, it really gives you very little over the current,
well functioning payloads capability other than stronger typing,
the ability to pick only those attributes that you want indexed (in
theory) and a byte (or so) of savings per any token that has a
payload, and we _HAVE_ right now, search support for payloads.
Payloads gives us nothing as developers - you can't use that
functionality without taking it from the users - payloads are for
users.
Flexible indexing will lead to all kinds of little cool things - the
likes of which have been discussed a lot in older emails. It will
likely lead to things we cannot predict as well. Everything will be
more flexible. It also could play a part in CSF, and work on
allowing custom files to plug into merging. Plus everything else
thats been mentioned (pfor, etc) I've been sold on the long term
benefits. I don't think you need these API for them, but its my
understanding it helps solve part of the equation.
A bunch of issues have come up. To my knowledge, they have been
addressed with vigor every time. If someone is unhappy with how
something has been addressed, and it needs to be addressed further,
please speak up.
Um, that's what I've been doing. Vigor is good. I very much
appreciate everyone's work. From what I can tell, most devs here are
unsure at best what to do with their existing Analyzer capabilities.
I've actually implemented a couple of new TokenFilter's using the new
APIs. I like that aspect of it. I'm just not sure on the back compat
hoops (and yes, I asked for them). But I'm also operating under the
assumption that our BC approach isn't going to change anytime soon,
such that it is very important that these new capabilities are worked
out (and I don't just mean little performance nicks here and there, I
mean in terms of usability and performance).
Let's put it this way: We expect to release 2.9 within the month
(which is very short in Lucene time). That will give us a sum total
of, what, 2.5 weeks of review by devs for some very major changes? I
want 3.0 as much as anyone (I've been pushing for 1.5 support for at
least 2 years now), but I don't want us to be in a hole going into it
because we felt rushed right when the "finish" was so close.
Otherwise, I don't think the sky is falling - I think the new API is
being shaken out.
I agree its not falling. It never is. This is in fact how the
process works. People are doing the right thing here by discussing it
and working on it.
Oh, and now it seems the new QP is dependent on it all.
Dependent how?
Attribute and a whole slew of AttributeImpls.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org