RE: Why release 3.0?

Uwe Schindler Mon, 16 Nov 2009 12:02:45 -0800

JFlex was not regenerated as far as I know, but if somebody did, its already
broken.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

  _____  

From: Robert Muir [mailto:[email protected]] 
Sent: Monday, November 16, 2009 8:53 PM
To: [email protected]
Subject: Re: Why release 3.0?

btw, so heres a great example. you are backwards broken regardless of JVM
for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5
in 3.0, right?

On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <[email protected]> wrote:

Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
      the tokenizer, remember to use JRE 1.4 to run jflex (before
      Lucene 3.0).  This grammar now uses constructs (eg :digit:,
      :letter:) whose meaning can vary according to the JRE used to
      run jflex.  See
      https://issues.apache.org/jira/browse/LUCENE-1126 for details.

On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <[email protected]> wrote:

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.

It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

  _____  

From: Robert Muir [mailto:[email protected]] 
Sent: Monday, November 16, 2009 8:45 PM

To: [email protected]
Subject: Re: Why release 3.0?

right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think. 

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <[email protected]> wrote:

We tried out: Character.getType() for these two chars:

Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7

The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

  _____  

From: Robert Muir [mailto:[email protected]] 
Sent: Monday, November 16, 2009 8:37 PM

To: [email protected]
Subject: Re: Why release 3.0?

right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <[email protected]> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene.  I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve

On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <[email protected]> wrote:
>
>
>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: [email protected]
>
>
> ________________________________
>
>
>       From: Robert Muir [mailto:[email protected]]
>       Sent: Monday, November 16, 2009 8:09 PM
>
>       To: [email protected]
>       Subject: Re: Why release 3.0?
>
>
>
>       uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <[email protected]>
wrote:
>
>       2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
>       Uwe
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: [email protected]
>
>
> ________________________________
>
>
>       From: Jake Mannix [mailto:[email protected]]
>       Sent: Monday, November 16, 2009 7:15 PM
>
>
>       To: [email protected]
>
>       Subject: Re: Why release 3.0?
>
>
>
>       Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
>       2.4 index file formats?  I suppose if you've already upgraded to
> 2.9, then all is well because
>       2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
>       If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
>       you will be in for a bad surprise if you try to upgrade to 3.1.
>
>         -jake
>
>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <[email protected]> wrote:
>
>       One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
>       That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
>       Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
>       Erick

-- 
Robert Muir
[email protected]

-- 
Robert Muir
[email protected]

-- 
Robert Muir
[email protected]

-- 
Robert Muir
[email protected]

RE: Why release 3.0?

Reply via email to