[ 
https://issues.apache.org/jira/browse/DERBY-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369155#comment-15369155
 ] 

Bryan Pendleton commented on DERBY-6894:
----------------------------------------

Hi Danoja, thanks for the updated patch. It makes a lot of sense to me, and I 
think 
I understand what's going on.

As a general question, is it true that the new "readHeaders" method that you
added to the Import class will, if 'skip > 0', open the input file, read the 
first
'skip' number of lines, parse the column structure, and construct an array of
the column headers, then close the input file and return the headers array?

If I am understanding this correctly, does this mean that we will open the input
file twice in this case? First to read the headers, then we will close it, then 
we
will re-open it, skip those header lines, and read the data portion of the file?

Assuming I'm understanding it correctly, it seems like a good general 
technique, and the code is very clean, but I am a little concerned that some
of the possible input data types to the importer may not be capable of
this open-read-close-reopen-reread-close pattern, because they might
only be able to be opened and read a single time.

Do you think it might be possible, in a future implementation, to process the
file in a single pass, where we open it and read the first few lines, then
leave the file open, then resume reading the remainder of the lines as data?

I don't think we necessarily have to do this in the first implementation; we
could report a new issue to handle this as a follow-on improvement, but it
seems useful to think about the possibility at this time.



> Enhance COLUMNINDEXES parsing for SYSCS_IMPORT_DATA_BULK to recognize columns 
> by name
> -------------------------------------------------------------------------------------
>
>                 Key: DERBY-6894
>                 URL: https://issues.apache.org/jira/browse/DERBY-6894
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Bryan Pendleton
>            Assignee: Danoja Dias
>            Priority: Minor
>         Attachments: Derby-6894.diff, NewDerby6894.diff, noHeaderLines.csv, 
> petlist.csv, repro.java, repro.java
>
>
> To ease maintainability and legibility of client programs, it would be
> nice if callers of SYSCS_IMPORT_DATA_BULK (and possibly also
> SYSCS_IMPORT_DATA) could refer to columns in the COLUMNINDEXES
> argument by column *NAME*, as well as by index *NUMBER*.
> So, for example, a valid COLUMNINDEXES specification might be:
>     '1,3,LastName,FirstName,7'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to