[
https://issues.apache.org/jira/browse/TAJO-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852967#comment-13852967
]
Hyunsik Choi commented on TAJO-9:
---------------------------------
This issue is duplicated to TAJO-435.
> Change the default intermediate data file format for hash repartitioning
> ------------------------------------------------------------------------
>
> Key: TAJO-9
> URL: https://issues.apache.org/jira/browse/TAJO-9
> Project: Tajo
> Issue Type: Improvement
> Components: data shuffle
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Fix For: 0.8-incubating
>
>
> For easy debugging, the hash repartitioning have used CSV as the default
> intermediate data format. CSV file format may cause parsing overhead, and it
> may cause relatively large intermediate data to be transmitted via networks.
> We need to change it to RawFile or another efficient file format.
> Digging PartitionedStoredExec class is a good starting point for this issue.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)