Hi,
A new feature to "skip" or "flag" invalid rows would be nice. Of
course it should be optional (you need to explicitly configure it).
What about:
-- just ignore invalid rows
SELECT * FROM CSVREAD('data/test.tsv',
null, 'skipInvalidRows=true');
-- returned columns: ID, NAME
-- add a new column with the column name "SKIPPED"
-- that is either null (for correct rows)
-- or contains the complete row for invalid rows
SELECT * FROM CSVREAD('data/test.tsv',
null, 'invalidRowColumn=SKIPPED');
-- returned columns: SKIPPED, ID, NAME
What do you think about this feature? (I don't plan to implement it
right now, but patches are welcome)
Regards,
Thomas
On Tue, Mar 27, 2012 at 4:46 PM, Noel Grandin <[email protected]> wrote:
> No such option at the moment.
>
> But you could cut and paste the CSV code in org.h2.tools.Csv and write your
> own importer.
>
> The existing stuff in H2 isn't meant to cover every possibility, but the
> beauty of open-source is that you can use the code to roll your own solution
> :-)
>
> Or look at using an open-source ETL tool like http://www.cloveretl.com/
>
>
>
> On 2012-03-27 16:41, Lizard Lizard wrote:
>>
>> I'm having an issue where a very large (30 gig) CSV import aborts at
>> various points due to small errors in a very tiny number of rows. As
>> each one is discovered, I can rewrite the import query to deal with
>> it, but it's a slow process to wait ten hours, get an error, track it
>> down, start over, wait 11 hours, get an error, etc. The size of the
>> file makes it difficult to correct it manually, and the export of the
>> file is done and cannot be easily redone, so the errors in the file
>> can't be correct by regenerating it from scratch.
>>
>> Is there any option, flag, or setting which will cause H2 to simply
>> skip invalid rows, possibly with a log as to which were skipped and
>> why?
>>
>> There's other solutions, including ways to clean the data somewhat, or
>> to split the file, etc, but they're generally inferior to a "skip and
>> log". We're talking, so far, less than a dozen bad rows out of ~40
>> million records.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/h2-database?hl=en.
>
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/h2-database?hl=en.