[
https://issues.apache.org/jira/browse/TAJO-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi updated TAJO-931:
------------------------------
Description:
There are some file formats (e.g., Parquet) which are not splittable. They can
usually span multiple HDFS blocks if one file is very large. It causes remote
HDFS access and limits the parallel degree, resulting in significant
performance degradation.
We can solve this problem if StoreTableExec or
{Col|SortBased}PartitionStoreExec can punctuate the final output file according
to the written size.
In addition, we need to support a session variable to determine the per file
size of final output files. So, TAJO-928 blocks this issue.
was:
There are some file formats (e.g., Parquet) which are not splittable. They can
usually span multiple HDFS blocks if one file is very large. It causes remote
HDFS access and limits the parallel degree, resulting in significant
performance degradation.
We can solve this problem if StoreTableExec or
{Col|SortBased}PartitionStoreExec can punctuate the final output file according
to the written size.
In addition, we need to support a session variable to determine the per file
size of final output files. So, TAJO-928 is a block of this issue.
> Output file can be punctuated depending on the file size.
> ---------------------------------------------------------
>
> Key: TAJO-931
> URL: https://issues.apache.org/jira/browse/TAJO-931
> Project: Tajo
> Issue Type: Improvement
> Components: physical operator
> Reporter: Hyunsik Choi
> Fix For: 0.9.0
>
>
> There are some file formats (e.g., Parquet) which are not splittable. They
> can usually span multiple HDFS blocks if one file is very large. It causes
> remote HDFS access and limits the parallel degree, resulting in significant
> performance degradation.
> We can solve this problem if StoreTableExec or
> {Col|SortBased}PartitionStoreExec can punctuate the final output file
> according to the written size.
> In addition, we need to support a session variable to determine the per file
> size of final output files. So, TAJO-928 blocks this issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)