[ https://issues.apache.org/jira/browse/LUCENE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286226#comment-17286226 ]
Greg Miller commented on LUCENE-9480: ------------------------------------- As per a discussion with Robert Muir on the dev list, I'm going to see if I can come up with a first pass for optimizing DataInput#skipBytes(). It won't be as aggressive as collapsing DataInput and IndexInput to start, but will be aimed at solving two issues: 1) avoiding the unnecessary copying of bytes around in order to skip, and 2) avoiding unnecessary garbage creation from each DataInput instance allocating its own skip buffer byte[]. > investigate slow DataInput.skipBytes > ------------------------------------ > > Key: LUCENE-9480 > URL: https://issues.apache.org/jira/browse/LUCENE-9480 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Priority: Major > > Currently DataInput has skipBytes(), but IndexInput also adds seek(). > There isn't a clear reason about the differences in the two methods: why > would you choose one over the other? > It causes some performance issues: for example the default implementation > actually reads bytes into a byte array and throws everything away. This is > really silly for MMapDirectory: skipping bytes should only be a glorified > {{+=}}. > So when I look at latest LUCENE-9447 patch, I can't help but think a ton of > waste is happening: > * Maybe skipBytes() is only used because the stored fields compressor > interface happens to take DataInput? Should it take IndexInput instead? > * Should skipBytes() be overridden by MMapDirectory rather than delegating to > super? doing real reads and byte array copies isn't free. It should be a > {{+=}} with single bounds check. > * Should we revisit having DataInput vs IndexInput at all? Maybe they should > be collapsed into one thing? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org