NehanPathan commented on code in PR #1154: URL: https://github.com/apache/lucenenet/pull/1154#discussion_r2058857674
########## src/Lucene.Net.Analysis.SmartCn/Hhmm/BigramDictionary.cs: ########## @@ -286,37 +304,37 @@ public virtual void LoadFromFile(string dctFilePath) int j = 0; while (j < cnt) { - dctFile.Read(intBuffer, 0, intBuffer.Length); - buffer[0] = ByteBuffer.Wrap(intBuffer).SetOrder(ByteOrder.LittleEndian) - .GetInt32();// frequency - dctFile.Read(intBuffer, 0, intBuffer.Length); - buffer[1] = ByteBuffer.Wrap(intBuffer).SetOrder(ByteOrder.LittleEndian) - .GetInt32();// length - dctFile.Read(intBuffer, 0, intBuffer.Length); - // buffer[2] = ByteBuffer.wrap(intBuffer).order( - // ByteOrder.LITTLE_ENDIAN).getInt();// handle + // LUCENENET: Use BinaryReader to decode little endian instead of ByteBuffer, since this is the default in .NET + buffer[0] = reader.ReadInt32(); // frequency + buffer[1] = reader.ReadInt32(); // length + buffer[2] = reader.ReadInt32(); // Skip handle value (unused) length = buffer[1]; - if (length > 0) + if (length > 0 && length <= MAX_VALID_LENGTH && dctFile.Position + length <= dctFile.Length) Review Comment: --- "Hi, regarding the `maxLength` check: - The `maxLength` was originally used to restrict the length of words read from the dictionary file, likely to avoid reading overly large or corrupted entries. However, this constraint wasn’t necessary in the current implementation, as there was no upstream requirement for it and our test cases also pass smoothly. - As such, we’ve removed the `maxLength` check for now. If it becomes necessary in the future (for example, to limit word sizes or handle specific use cases), we can easily reintroduce it. --- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org