deshanxiao commented on issue #1475: URL: https://github.com/apache/orc/issues/1475#issuecomment-1521825385
> Yes, the order is fixed. This is implemented in the `recordPosition` call as below. > > In the `TreeWriterBase.java`, positions of present stream are recorded first. > > https://github.com/apache/orc/blob/792c3f820d0b7a64b27c9dc4c390443386e6baf0/java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java#L369-L377 > > And then in the `StringBaseTreeWriter.java`, positions of data stream and length stream are recorded in order. > > https://github.com/apache/orc/blob/9dbf833868591314014958cc58cd57fb1e8e739c/java/core/src/java/org/apache/orc/impl/writer/StringBaseTreeWriter.java#L265-L270 > > I followed the same order when I was implementing the C++ writer so they should be consistent. Thank you for sharing the Java code. I double check it and you are right @wgtmac . > In a direct-encoded string columns, DATA stream can be placed BEFORE or AFTER LENGTH stream. Same flexibility for PRESENT stream. In fact, different languages currently have different order implementations. The order of java depends on the method of compareTo to flush the stream to disk. > Even data streams of different columns can be interleaved. Do you mean that the streams will cross for different columns like: col1 streamtype1 col2 streamtype1 col1 streamtype2 I notice that the streams in the same column will appear together, but the order of the streams in different column is uncertain even they are the same data type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
