[
https://issues.apache.org/jira/browse/DRILL-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270207#comment-14270207
]
Jason Altekruse commented on DRILL-1965:
----------------------------------------
Two methods for doing this have been explored. It would be useful if we had a
human editable format for test writers to write input files and baselines
themselves. JSON is the easiest format currently, so I explored writing a JSON
file with numerics or strings that could be cast into all supported types. The
patch attached is the initial effort on this work, the interval types do not
appear to be casting correctly as they are specified right now, I think its
just a formatting problem, but I am going to had this off to Ramana for further
work and generating the parquet files.
The patch also includes some code to generate a physical plan that uses the
mock-scan operator, which was a workaround I was trying before I realized the
unsigned types were not fully implemented (there are references to them in the
code, but they can not be casted to and are not currently supported). This did
reveal some shortcomings in the generateTestData method of several of the value
vector types like date and timestamp.
> Expand read and write testing for parquet across all supported types
> --------------------------------------------------------------------
>
> Key: DRILL-1965
> URL: https://issues.apache.org/jira/browse/DRILL-1965
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jason Altekruse
> Assignee: Jason Altekruse
>
> The additional types we added to the parquet spec to allow use of parquet as
> a general purpose export format for drill query results have not all been
> thoroughly tested, we should make a better set of tests to ensure that the
> read and write path for the types are all working properly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)