[ 
https://issues.apache.org/jira/browse/DRILL-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270207#comment-14270207
 ] 

Jason Altekruse commented on DRILL-1965:
----------------------------------------

Two methods for doing this have been explored. It would be useful if we had a 
human editable format for test writers to write input files and baselines 
themselves. JSON is the easiest format currently, so I explored writing a JSON 
file with numerics or strings that could be cast into all supported types. The 
patch attached is the initial effort on this work, the interval types do not 
appear to be casting correctly as they are specified right now, I think its 
just a formatting problem, but I am going to had this off to Ramana for further 
work and generating the parquet files.

The patch also includes some code to generate a physical plan that uses the 
mock-scan operator, which was a workaround I was trying before I realized the 
unsigned types were not fully implemented (there are references to them in the 
code, but they can not be casted to and are not currently supported). This did 
reveal some shortcomings in the generateTestData method of several of the value 
vector types like date and timestamp.

> Expand read and write testing for parquet across all supported types
> --------------------------------------------------------------------
>
>                 Key: DRILL-1965
>                 URL: https://issues.apache.org/jira/browse/DRILL-1965
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jason Altekruse
>            Assignee: Jason Altekruse
>
> The additional types we added to the parquet spec to allow use of parquet as 
> a general purpose export format for drill query results have not all been 
> thoroughly tested, we should make a better set of tests to ensure that the 
> read and write path for the types are all working properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to