[
https://issues.apache.org/jira/browse/ASTERIXDB-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343915#comment-17343915
]
Michael J. Carey commented on ASTERIXDB-2901:
---------------------------------------------
One could imagine some different "infer" options - e.g.,
"infer" = N — look at the header plus the first N rows to infer a schema
"infer" = ALL — look at the whole file to infer a schema
"infer" = SAMPLE(N) — pick N rows at random to infer a schema
One could also imagine no-header versions where the field names come from the
CREATE and the data types are inferred, though this seems to make less sense
(helps save less work).
> Infer schema from CSV header
> ----------------------------
>
> Key: ASTERIXDB-2901
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2901
> Project: Apache AsterixDB
> Issue Type: New Feature
> Reporter: Gift Sinthong
> Priority: Trivial
>
> Creating external datasets from CSV files should be able to infer the
> attribute names from the file header if present and sample records for the
> data type. For example, in the create statement there could be an "infer"
> flag that takes in the number of records to scan like the below statement.
> CREATE EXTERNAL DATASET Employee() USING localfs
> (("path"="localhost:///employees.csv"), ("format"="delimited-text"),
> ("delimiter"=","), ("header"=true), ("infer"=10))
--
This message was sent by Atlassian Jira
(v8.3.4#803005)