williamhyun opened a new pull request #672:
URL: https://github.com/apache/orc/pull/672
### What changes were proposed in this pull request?
This PR aims to fix regression on column names with a dot character.
### Why are the changes needed?
Since ORC-696, we can not read the orc files with column names including a
dot. For example, the following test file was read incorrectly.
```
% orc-tools meta core/src/test/resources/col.dot.orc
Processing data file core/src/test/resources/col.dot.orc [length: 235]
Structure for core/src/test/resources/col.dot.orc
File Version: 0.12 with ORC_517
Rows: 1
Compression: SNAPPY
Compression size: 262144
Calendar: Julian/Gregorian
Type: struct<`col.dot`:bigint>
Stripe Statistics:
Stripe 1:
Column 0: count: 1 hasNull: false
Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 0 max: 0 sum: 0
File Statistics:
Column 0: count: 1 hasNull: false
Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 0 max: 0 sum: 0
Stripes:
Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 24
Stream: column 1 section DATA start: 38 length 6
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
File length: 235 bytes
Padding length: 0 bytes
Padding ratio: 0%
User Metadata:
org.apache.spark.version=3.1.1
________________________________________________________________________________________________________________________
```
### How was this patch tested?
Pass the CIs with the newly added test case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]