[
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Turton closed DRILL-8492.
-------------------------------
Resolution: Fixed
> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit
> integer values
> -------------------------------------------------------------------------------------------
>
> Key: DRILL-8492
> URL: https://issues.apache.org/jira/browse/DRILL-8492
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Parquet
> Affects Versions: 1.21.1
> Reporter: Peter Franzen
> Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two
> options similar to “store.parquet.reader.int96_as_timestamp" to control
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{ store.parquet.reader.time_micros_as_int64: false,}}
> {{ store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
> * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
> *
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
> * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the
> correspondning option is set to true.
> In addition to this,
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must
> be altered to _not_ truncate the min and max values for
> time_micros/timestamp_micros if the corresponding option is true. This class
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}}
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT * FROM <file> WHERE <timestamp_micros_column> = 1705914906694751;}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)