GitHub user traflm opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/441

    [TRAFODION-1920] suppress SQL error during HIVE_SCAN when encounter i…

    …nvalid value, assign null to the invalid value
    
    This pull request try to solve the issue when Trafodion try to retrieve 
data from a HDFS text file which contains invalid data. Like:
    space for a numeric value
    non-numeric string for a numeric value
    
    this change will use the HIVE_SCAN_SPECAIL_MODE '2' , when that CQD is set 
to 2, HIVE_SCAN will automaticlly convert the invalid data into null and 
continue, instead of error out immediately.
    
    How it does?
    HIVE_SCAN need to do the data type conversion via expression evaluation. 
There are two ways to evaluate an expression: pCode and eval() function. Since 
the current HIVE_SCAN do conversion row by row, and details are encoded in 
corresponding pCode. So the simplest way I can find out is to fix the invalid 
data by set it to null and retry the whole evaluation.
    In pCode and eval() function, right after a failure, it saves the offset of 
the offending source data pointer. In HIVE_SCAN work() function, once it got an 
evaluation error, find and fix the offending data in form of "R2" , exploded 
format, and retry the conversion. The maxium retry time is the number of 
columns.
    So this change will only change pCode and normal evaluation a little bit. 
After an error, try to remember the offset of offending source data. It will 
not affect any normal evaluation/execution.
    Only in HIVE_SCAN, in a special mode controlled by CQD, it will retry after 
the failure.
    This will not introduce any unknown side effort.
    It is easier and maybe faster to change pCode execution to set the target 
as null and continue, but since pCode is used in many cases, not only HIVE_SCAN 
move experession. So that change will introduce unknown side effect.
    This retry logic does introduce overhead, but in normal case the code path 
is same as before, in special mode, if no invalid data, it will also go through 
same path as before, just retry in case there is converting error.
    This change only fix from HIVE_SCAN perspective, to fully support 
transparent Bulkloader, we still need to change the bulkloader work() method to 
tolerate other source of failure. This change will make sure there will be no 
failure during HIVE_SCAN. Later, if there is CAST, or TRANSLATE or TRUNCATE 
issues, we still need to change bulkload code.
    So this is the first part to fully support TRAFODION-1920, but it should 
handle a lot of real cases already.
    Add some sample test cases in hive/TEST005 also, interesting reviewers can 
check the test for more details.
    Please help to review, and this will be a help in some real use cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/traflm/incubator-trafodion TRAFODION-1912

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/441.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #441
    
----
commit 8f78b446e9002ff9cd5e0f3ca2b5feead9726859
Author: Liu Ming <[email protected]>
Date:   2016-04-19T03:02:16Z

    [TRAFODION-1920] suppress SQL error during HIVE_SCAN when encounter invalid 
value, assign null to the invalid value

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to