[ 
https://issues.apache.org/jira/browse/TAJO-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100349#comment-14100349
 ] 

ASF GitHub Bot commented on TAJO-931:
-------------------------------------

Github user blrunner commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/119#discussion_r16338875
  
    --- Diff: 
tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionStoreExec.java
 ---
    @@ -67,6 +79,15 @@ public ColPartitionStoreExec(TaskAttemptContext context, 
StoreTableNode plan, Ph
           meta = CatalogUtil.newTableMeta(plan.getStorageType());
         }
     
    +    if (!(plan instanceof InsertNode)) {
    +      String nullChar = 
context.getQueryContext().get(SessionVars.NULL_CHAR);
    +      meta.putOption(StorageConstants.CSVFILE_NULL, nullChar);
    --- End diff --
    
    You need to consider other null characters because of 
StorageConstants.SEQUENCEFILE_NULL and StorageConstants.RCFILE_NULL.


> Output file can be punctuated depending on the file size.
> ---------------------------------------------------------
>
>                 Key: TAJO-931
>                 URL: https://issues.apache.org/jira/browse/TAJO-931
>             Project: Tajo
>          Issue Type: Improvement
>          Components: physical operator
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.9.0
>
>
> There are some file formats (e.g., Parquet) which are not splittable. They 
> can usually span multiple HDFS blocks if one file is very large. It causes 
> remote HDFS access and limits the parallel degree, resulting in significant 
> performance degradation.
> We can solve this problem if StoreTableExec or 
> {Col|SortBased}PartitionStoreExec can punctuate the final output file 
> according to the written size.
> In addition, we need to support a session variable to determine the per file 
> size of final output files. So, TAJO-928 blocks this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to