stevedlawrence opened a new pull request #77: Modifications to IO layer to support streaming input data URL: https://github.com/apache/incubator-daffodil/pull/77 - Modify the ByteBufferDataInputStream to no longer depend on ByteBuffers. It is now an InputSourceDataInputStream, and one can implement a new InputSource interface to provide Bytes. This also includes various changes like how bitPos and bitLimit are stored to simplify code (e.g. no offsets) and no longer requires a bitLimit to be set, since not all inputs may know a bit limit. Moves TLState members into the PState--the InputSourceDataInputStream class is now created by the user and so isn't necessarily created in thread from which it will be used, breaking the ThreadLocal functionality. - Create two InputSource implementations, one using a ByteBuffer as the data store and one using a bucketing algorithm to support files larger than Int.MaxValue and can free up data that can no longer be backtracked to. - Remove the DataLimits, most of these values weren't actually used. Instead, create new tunables for those that were and use those where appropriate. - When we decode data to characters, we need to know exactly how many bits were used to decode each character. The Java decoders do not provide this information, requiring a lot of complex code to keep track, and even then there were bugs. This creates our own Decoders that provides the exact information we need and allow for further modifications that may be needed for things like dfdl:errorEncodingPolicies and dfdl:utf16Width. - Remove the reporting and replacing decoders. Instead, we now just have a single decoder and it handles replacing/reporting based on the format info. Another benefit of our custom Decoders. - Modifies the parse Scala/Java API to expect an InputSourceDataInputStream, created by the API user. Other public API methods are deprecated and are modified to create an InputSourceDataInputStream behind the scenes. API functions are also simplified to not take in a bit starting position or bit limit. This were really only used for testing, and were used fairly rarely. Alternative methods are used to set these values where necessary. - Adds the --stream option to the CLI parse subcommand. When this is provided, if there is left over data at the end of a parse, the CLI will perform a new parse continuing where the previous left off. - Modify the TDMLRunner to be based on java.nio.InputStreams rather than Channels. It was already using streams for everything and then just wrapping with a channel without provided any actual benefit. - Fix issue where isAtEnd sometimes does not return a correct value. It now queries the underlying data stream to determine if there is more data or not, rather than relying on bitLimit which might not always be set. DAFFODIL-934, DAFFODIL-931, DAFFODIL-1065, DAFFODIL-1565
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
