[GitHub] [incubator-druid] es1220 opened a new pull request #7282: Enhance orc-extensions - use orc file schema

GitBox Sun, 17 Mar 2019 19:12:35 -0700

es1220 opened a new pull request #7282: Enhance orc-extensions  - use orc file 
schema
URL: https://github.com/apache/incubator-druid/pull/7282
 
 
   `orc-extensions` uses custom struct `typeString`. (user configuration or 
druid parser auto making)
   
   `typeString` is an unstable and has the potential to make a mistake. (such 
as column order, type ..)
   
   So, I create **`DruidOrcNewInputFormat`** and **`druid_orc`** parser type.
   Now, if you change only the `inputFormat` and parser `type`, you can easily 
ingest the orc file like a `parquet-extensions` without any `typeString` errors.
   
   
   - `DruidOrcNewInputFormat`
     - has `OrcNewInputFormat`
     - creates `DruidOrcRecordReader` and store file schema
   - `DruidOrcRecordReader` 
     - converts `OrcStruct` to `Map<String, Object>` by stored file schema.
       (This has moved the existing process in `OrcHadoopInputRowParser`.)
   - `DruidOrcHadoopInputRowParser` 
     - converts `Map` to `MapBasedInputRow`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-druid] es1220 opened a new pull request #7282: Enhance orc-extensions - use orc file schema

Reply via email to