[ https://issues.apache.org/jira/browse/LUCENE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603349#comment-16603349 ]
Jim Ferenczi commented on LUCENE-8476: -------------------------------------- Thanks [~danmuzi] ! The new patch looks good, I'll commit shortly. > Optimizations in UserDictionary (KoreanAnalyzer) > ------------------------------------------------ > > Key: LUCENE-8476 > URL: https://issues.apache.org/jira/browse/LUCENE-8476 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Reporter: Namgyu Kim > Priority: Major > Labels: optimization, patch-available > Attachments: LUCENE-8476.patch, LUCENE-8476.patch > > > -■ Bug fix- > -1) BufferedReader's close method is not called.- *(Wrong check)* > {code:java} > // Line 57 method > public static UserDictionary open(Reader reader) throws IOException { > BufferedReader br = new BufferedReader(reader); > String line = null; > List<String> entries = new ArrayList<>(); > // text + optional segmentations > while ((line = br.readLine()) != null) { > ... > } > if (entries.isEmpty()) { > return null; > } else { > return new UserDictionary(entries); > } > }{code} > If you look at the code above, there is no close() method for the "br" > variable. > As I know, BufferedReader can cause a +memory leak+ if the close method is > not called. > So I changed the code below. > {code:java} > // Line 57 method > public static UserDictionary open(Reader reader) throws IOException { > String line = null; > List<String> entries = new ArrayList<>(); > // text + optional segmentations > try (BufferedReader br = new BufferedReader(reader)) { > while ((line = br.readLine()) != null) { > ... > } > } > if (entries.isEmpty()) { > return null; > } else { > return new UserDictionary(entries); > } > } > {code} > I solved this problem with > "[try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]" > method available since Java 7. > > ■ Optimizations > 1) Change from Collections.sort to List.sort (UserDictionary constructor) > {code:java} > // Line 82 method > private UserDictionary(List<String> entries) throws IOException { > final CharacterDefinition charDef = CharacterDefinition.getInstance(); > Collections.sort(entries, > Comparator.comparing(e -> e.split("\\s+")[0])); > PositiveIntOutputs fstOutput = PositiveIntOutputs.getSingleton(); > ... > }{code} > List.sort in Java 8 is known to be faster than existing Collections.sort. > ([http://ankitsambyal.blogspot.com/2014/03/difference-between-listsort-and.html]) > So I changed the code below. > {code:java} > // Line 82 method > private UserDictionary(List<String> entries) throws IOException { > final CharacterDefinition charDef = CharacterDefinition.getInstance(); > entries.sort(Comparator.comparing(e -> e.split("\\s+")[0])); > PositiveIntOutputs fstOutput = PositiveIntOutputs.getSingleton(); > ... > }{code} > > 2) Remove unnecessary null check (UserDictionary constructor) > {code:java} > // Line 82 method > private UserDictionary(List<String> entries) throws IOException { > ... > String lastToken = null; > ... > for (String entry : entries) { > String[] splits = entry.split("\\s+"); > String token = splits[0]; > if (lastToken != null && token.equals(lastToken)) { > continue; > } > char lastChar = entry.charAt(entry.length()-1); > ... > }{code} > Looking at this part of the code, > {code:java} > if (lastToken != null && token.equals(lastToken)) { > continue; > }{code} > A null check for lastToken is unnecessary. > Because the equals method of the String class internally performs a null > check. > So I changed the code as below. > {code:java} > // Line 82 method > private UserDictionary(List<String> entries) throws IOException { > ... > String lastToken = null; > ... > for (String entry : entries) { > String[] splits = entry.split("\\s+"); > String token = splits[0]; > if (token.equals(lastToken)) { > continue; > } > char lastChar = entry.charAt(entry.length()-1); > ... > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org