[jira] [Commented] (TAJO-1905) Insert clause to partitioned table fails on S3

ASF GitHub Bot (JIRA) Thu, 11 Feb 2016 09:07:49 -0800

    [ 
https://issues.apache.org/jira/browse/TAJO-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143060#comment-15143060
 ]


ASF GitHub Bot commented on TAJO-1905:
--------------------------------------

GitHub user blrunner opened a pull request:

    https://github.com/apache/tajo/pull/959

    TAJO-1905: Insert clause to partitioned table fails on S3

    Currently, Tajo output committer works as following:
    * Each task write output to a temp directory.
    * ``FileTablespace::commitTable`` renames first successful task's temp 
directory to final destination.
    
    But above approach will occurs FileNotFoundException because of eventual 
consistency of S3. To resolve it, I implemented output committer for S3 and the 
committer works as following:
    * Each task write output to local disk instead of S3 (in CTAS statement or 
INERT statement)
    * ``S3TableSpace::commitTable`` copies first successful task's temp 
directory to S3.
    
    This PR depends on https://github.com/apache/tajo/pull/952. CTAS statement 
and INSERT statement for partition table ran successfully with this PR. For the 
reference, I was inspired by Netflix integrating spark 
slide(http://www.slideshare.net/piaozhexiu/netflix-integrating-spark-at-petabyte-scale-53391704).
 
    
    To resolve this issue basically, each task need to write output to final 
destination and we need to implement pluggable output committer. But this way 
looks like a long time work. I think that this PR may be an interim work for 
the pluggable output committer.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/blrunner/tajo TAJO-1905

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/959.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #959
    
----

----


> Insert clause to partitioned table fails on S3
> ----------------------------------------------
>
>                 Key: TAJO-1905
>                 URL: https://issues.apache.org/jira/browse/TAJO-1905
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: QueryMaster, S3
>            Reporter: Jinho Kim
>            Assignee: Jaehwa Jung
>             Fix For: 0.12.0
>
>
> Here is the error log
> {noformat}
> 2015-10-02 18:54:40,399 ERROR org.apache.hadoop.fs.s3a.S3AFileSystem: rename: 
> src not found 
> s3a://bucket/tpch-1g-p/lineitem/.staging/q_1443779192380_0001/RESULT/l_shipdate=1996-01-30
> 2015-10-02 18:54:51,357 ERROR org.apache.hadoop.fs.s3a.S3AFileSystem: rename: 
> src not found 
> s3a://bucket/tpch-1g-p/lineitem/.staging/q_1443779192380_0001/RESULT/l_shipdate=1993-11-09
> 2015-10-02 18:55:03,955 ERROR org.apache.tajo.querymaster.Query: No such file 
> or directory: s3a://bucket/lineitem/l_shipdate=1994-02-02
> java.io.FileNotFoundException: No such file or directory: 
> s3a://bucket/tpch-1g-p/lineitem/l_shipdate=1994-02-02
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
> at org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1467)
> at 
> org.apache.tajo.querymaster.Query$QueryCompletedTransition.getPartitionsWithContentsSummary(Query.java:550)
> at 
> org.apache.tajo.querymaster.Query$QueryCompletedTransition.finalizeQuery(Query.java:512)
> at 
> org.apache.tajo.querymaster.Query$QueryCompletedTransition.transition(Query.java:446)
> at 
> org.apache.tajo.querymaster.Query$QueryCompletedTransition.transition(Query.java:435)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at org.apache.tajo.querymaster.Query.handle(Query.java:874)
> at org.apache.tajo.querymaster.Query.handle(Query.java:63)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:745)
> 2015-10-02 18:55:03,958 INFO org.apache.tajo.querymaster.Query: 
> q_1443779192380_0001 Query Transitioned from QUERY_RUNNING to QUERY_ERROR
> 2015-10-02 18:55:03,958 INFO org.apache.tajo.querymaster.Query: Processing 
> q_1443779192380_0001 of type DIAGNOSTIC_UPDATE
> 2015-10-02 18:55:03,958 INFO org.apache.tajo.querymaster.QueryMasterTask: 
> Query completion notified from q_1443779192380_0001 final state: QUERY_ERROR
> 2015-10-02 18:55:03,960 INFO org.apache.tajo.querymaster.QueryMasterTask: 
> Stopping QueryMasterTask:q_1443779192380_0001
> 2015-10-02 18:55:03,960 INFO org.apache.tajo.querymaster.QueryMasterTask: 
> Cleanup resources of all workers. Query: q_1443779192380_0001, workers: 1
> 2015-10-02 18:55:03,962 INFO org.apache.tajo.querymaster.QueryMasterTask: 
> Stopped QueryMasterTask:q_1443779192380_0001
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TAJO-1905) Insert clause to partitioned table fails on S3

Reply via email to