[jira] [Commented] (ASTERIXDB-2901) Infer schema from CSV header

Michael J. Carey (Jira) Thu, 13 May 2021 08:00:08 -0700


    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343915#comment-17343915
 ]


Michael J. Carey commented on ASTERIXDB-2901:
---------------------------------------------

One could imagine some different "infer" options - e.g., 

   "infer" = N — look at the header plus the first N rows to infer a schema

   "infer" = ALL — look at the whole file to infer a schema

   "infer" = SAMPLE(N) — pick N rows at random to infer a schema

One could also imagine no-header versions where the field names come from the 
CREATE and the data types are inferred, though this seems to make less sense 
(helps save less work).

> Infer schema from CSV header
> ----------------------------
>
>                 Key: ASTERIXDB-2901
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2901
>             Project: Apache AsterixDB
>          Issue Type: New Feature
>            Reporter: Gift Sinthong
>            Priority: Trivial
>
> Creating external datasets from CSV files should be able to infer the 
> attribute names from the file header if present and sample records for the 
> data type. For example, in the create statement there could be an "infer" 
> flag that takes in the number of records to scan like the below statement.
> CREATE EXTERNAL DATASET Employee() USING localfs 
> (("path"="localhost:///employees.csv"), ("format"="delimited-text"), 
> ("delimiter"=","), ("header"=true), ("infer"=10))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ASTERIXDB-2901) Infer schema from CSV header

Reply via email to