[jira] [Comment Edited] (DERBY-4555) Expand SYSCS_IMPORT_TABLE to accept CSV file with header lines

Yair Lenga (JIRA) Mon, 20 Jun 2016 10:11:26 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339922#comment-15339922
 ]


Yair Lenga edited comment on DERBY-4555 at 6/20/16 5:11 PM:
------------------------------------------------------------

Lot of good questions. I'll try to provide feedback, based on my own experience 
/ requirements. They might be different from other users.

for #1 and #2.
import_data vs import_table. I think import_data is much more important. It is 
a super-set of import_table. Import_table works only for few cases, when the 
external data is aligned with the internal table structure - for example, when 
the data was exported from the table. I would vote to focus only on import_data 
for the new functionality, and ignore import_table at this time. In retrospect, 
I should have highlighted this priorities in the ticket.

For name matching. I suggest EXACT match on column names in the CSV. Most users 
of this extension are going to be to bring in data from generated in another 
system. The system that generate data in known pattern. Also, for data export, 
it is common practice to avoid using ALL special characters in the column 
heading row, as many systems will fail to process the data. Including support 
for special characters, quoting, etc. require lot of work, and brings mostly 
risk, and complexity. In short, lot of downside, almost no upside.

I would admit that part of my preference toward a minimal solution that address 
my own situation is that I hope to see the fix getting released in few months, 
minimizing the change that it will get delayed due to quality issues.

Thanks for your effort! Yair.


was (Author: yairlenga):
Lot of good questions. I'll try to provide feedback, based on my own experience 
/requirements.

for #1 and #2.
import_data vs import_table. I think import_data is much more important. It is 
a super-set of import_table. Import_table works only for few cases, when the 
external data is aligned with the internal table structure - for example, when 
the data was exported from the table. I would vote to focus only on import_data 
for the new functionality, and ignore import_table at this time. In retrospect, 
I should have highlighted this priorities in the ticket.

For name matching. I suggest EXACT match on column names in the CSV. Most users 
of this extension are going to be to bring in data from generated in another 
system. The system that generate data in known pattern. Also, for data export, 
it is common practice to avoid using ALL special characters in the column 
heading row, as many systems will fail to process the data. Including support 
for special characters, quoting, etc. require lot of work, and brings mostly 
risk, and complexity. In short, lot of downside, almost no upside.

I would admit that part of my preference toward a minimal solution that address 
my own situation is that I hope to see the fix getting released in few months, 
minimizing the change that it will get delayed due to quality issues.

Thanks for your effort! Yair.

> Expand SYSCS_IMPORT_TABLE to accept CSV file with header lines
> --------------------------------------------------------------
>
>                 Key: DERBY-4555
>                 URL: https://issues.apache.org/jira/browse/DERBY-4555
>             Project: Derby
>          Issue Type: Improvement
>          Components: Miscellaneous
>            Reporter: Yair Lenga
>            Assignee: Danoja Dias
>         Attachments: NoVarargs.diff, Varargs.diff, 
> addNewSystemProcedure_1.diff, gotException.diff, hardCoded.diff, latest.diff, 
> noHeaderLines.csv, petlist.csv, petlist.csv, petlist.csv, repro.java, 
> repro.java, repro.java, skipHeaders.diff
>
>
> The SYSCS_IMPORT_TABLE (and SYSCS_IMPORT_DATA) function allow import of data 
> from external resources. In general, they can process CSV files that created 
> with various tools - with one exception: the header line.
> While there is no accepted standard, most tools will include a header line in 
> the CSV file with column names. This convention is supported in Excel and 
> many other tools.
> My Request: extend the SYSCS_IMPORT_TABLe and SYSCS_IMPORT_DATA (and other 
> related procedures) to include an extra indicator for the number of header 
> lines to be ignored.
> As an extra bonus it will be help is the SYSCS_IMPORT_DATA will accept column 
> names (instead of column indexes) in the 'COLUMNINDEXES' arguments. E.g., it 
> should be possible to indicate COLUMNINDEXES of '1,3,sales,5,'. This feature 
> will make it significantly easier to handle cases where the external input 
> files is extended to include additional columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DERBY-4555) Expand SYSCS_IMPORT_TABLE to accept CSV file with header lines

Reply via email to