GitHub user ddrinka opened a pull request:
https://github.com/apache/orc/pull/308
Deliver a lower-case schema to OrcFile
Mixed-case struct field names don't work in Hive. There should be a way to
convert a camel-cased JSON document into ORC without having to pre-process the
JSON.
This pull request is a proof-of-concept which generates two schemas, one
using the default case which is provided to the JsonReader as usual, and
another schema which is lower cased and is provided to OrcFile.
TypeDescription is immutable and non-trivial to manually clone using public
accessors, so to make the idea clear, I do the conversion at schema ingest
rather than where it's provided to OrcFile. The downside of this approach is
that automatic schema detection doesn't benefit from these changes. A more
experienced implementer could certainly do better.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ddrinka/orc ddrinka-pr-lowercase-schema
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/orc/pull/308.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #308
----
commit cc7e909725d059b69f9a8c384aca2691b52ce0ff
Author: Douglas Drinka <ddrinka@...>
Date: 2018-09-13T22:59:11Z
Deliver a lower-case schema to OrcFile
----
---