rlaehdals commented on issue #14645: URL: https://github.com/apache/lucene/issues/14645#issuecomment-3264235977
I have been looking into whether the skeleton.txt files can be removed. Since JFlex already provides a built-in skeleton, it seems that skeleton.default.txt can safely be deleted. The main issue is with skeleton.disable.buffer.expansion.txt. Because JFlex does not provide a built-in option to disable buffer expansion, the tests related to buffer size are currently failing. As a temporary workaround, I modified the generated code in JFlexTask using regex replacements to remove the buffer expansion logic. However, I am not certain whether this is the appropriate solution. I would greatly appreciate any feedback or suggestions on a better approach. ``` configure(project(":lucene:core")) { task generateStandardTokenizerInternal(type: JFlexTask) { description = "Regenerate StandardTokenizerImpl.java" group = "generation" jflexFile = file('src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex') // NOTE: The following modifications in `doLast` are applied // after JFlex generates StandardTokenizerImpl.java. // These changes adjust buffer handling and error conditions. doLast { ant.replace( file: file('src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java'), encoding: "UTF-8", token: "private static final int ZZ_BUFFERSIZE =", value: "private int ZZ_BUFFERSIZE =" ) def content = file('src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java').text content = content.replaceAll( /\/\* is the buffer big enough\? \*\/[\s\S]*?(?=\/\* fill the buffer with new input \*\/)/, '' ) content = content.replaceAll( /int requested = zzBuffer\.length - zzEndRead;/, """int requested = zzBuffer.length - zzEndRead - zzFinalHighSurrogate; if (requested == 0) { return true; }""" ) content = content.replaceAll( /if \(numRead == 0\) \{\s*if \(requested == 0\) \{[\s\S]*?\}\s*else \{[\s\S]*?\}\s*\}/, """if (numRead == 0) { throw new java.io.IOException( "Reader returned 0 characters. See JFlex examples/zero-reader for a workaround."); }""" ) content = content.replaceAll( /if \(numRead == requested\) \{[\s\S]*?zzFinalHighSurrogate = 1;[\s\S]*?\}/, """if (numRead == requested) { // We requested too few chars to encode a full Unicode character --zzEndRead; zzFinalHighSurrogate = 1; if (numRead == 1) { return true; } }""" ) file('src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.java').text = content } } def generateStandardTokenizer = wrapWithPersistentChecksums(generateStandardTokenizerInternal, [ andThenTasks: [ "applyGoogleJavaFormat" ], mustRunBefore: ["compileJava"] ]) regenerate.dependsOn generateStandardTokenizer } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org