[ https://issues.apache.org/jira/browse/LUCENE-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989060#comment-16989060 ]
Erick Erickson commented on LUCENE-9080: ---------------------------------------- [~rcmuir] Thanks, that put me on a path to figure some things out. I'm still baffled, just in a different way. tl;dr; - It looks like there are a bunch of hand-edits that are unimportant. They should be fixed at the source though if possible. - There are a couple of hand-edits that should be fixed in the input source rather than the output. -- see LUCENE-8683 and Nikolay Khitrin's comments/work for specific instances of hand-edits to java files that should be moved. [~dsmiley] [~sarowe] there are a couple of JIRAs mentioned in LUCENE-8683 , I may be asking you glance at the ones you worked on and see if you recall anything about those changes. - We should upgrade javacc to 6.0, we're getting deprecated methods generated (I think) LONG FORM: I tried going to branch_8x, Java8 and spoofing the bits that download nfc.txt nfkc.txt nfkc_cf.txt to use what's already checked out. If I ignore all the obvious hand-edits and checksum differences and bogus imports, here's what's still weird: - HTMLCharEntites.jflex acquires an added pair of parens near the end: (' | "zwj" | "zwnj"', ')') Several binary files show differences, but whether that's just my IDE not being able to deal with the charsets IDK. - org/apache/lucene/analysis/ja/dict/TokenInfoDictionary$fst.dat - org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$fst.dat - org/apache/lucene/analysis/icu/utr30.nrm is different - 9 test binary files en-test-*.bin TestICUFoldingFilterFactory still fails, here's one. - ant test -Dtestcase=TestICUFoldingFilterFactory -Dtests.method=testBogusArguments -Dtests.seed=311B6E926642DA19 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=kn-IN -Dtests.timezone=Europe/Tirane -Dtests.asserts=true -Dtests.file.encoding=UTF-8 with this error: Caused by: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 4.0.0.0 I see jflex was upgraded, but I don't think regenerate was run after that. This is virtually identical to what I get when trying this on master, pulling down new nfck*.txt files. Well, not guaranteeing the binary files are identical. There are a few other differences like: exptokseq[i] = jj_expentries.get(i); (old) .vs. exptokseq[i] = (int[])jj_expentries.get(i); (new, hand edit I think?) and these files aren't present in master at all, they're "Untracked" according to Git. lucene/core/src/java/org/apache/lucene/util/packed/Direct*.java lucene/core/src/java/org/apache/lucene/util/packed/Packed*ThreeBlocks.java the TestICUFoldingFilterFactory still fails > "ant regenerate" fails on master > -------------------------------- > > Key: LUCENE-9080 > URL: https://issues.apache.org/jira/browse/LUCENE-9080 > Project: Lucene - Core > Issue Type: Bug > Reporter: Erick Erickson > Assignee: Erick Erickson > Priority: Major > Attachments: after_regen.patch, before_regen.patch, status.res > > > The root cause is that RamUsageEstimator.NUM_BYTES_INT has been removed and > the python scripts still reference it in the generated scripts. That part's > easy to fix. > Last time I looked, though, the regenerate produces some differences in the > generated files that should be looked at to insure they're benign. > Not really sure whether this should be a Lucene or Solr JIRA. Putting it in > Lucene since one of the failed files is: > lucene/core/src/java/org/apache/lucene/util/packed/Packed8ThreeBlocks.java > I do know that one of the Solr jflex-produced file has an unexplained > difference so it may bleed over. > "ant regenerate" needs about 24G on my machine FWIW. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org