[
https://issues.apache.org/jira/browse/HADOOP-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HADOOP-19714:
------------------------------------
Labels: pull-request-available (was: )
> testCombineFileInputFormat is overly constrained and can sometimes fail
> -----------------------------------------------------------------------
>
> Key: HADOOP-19714
> URL: https://issues.apache.org/jira/browse/HADOOP-19714
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Paco Chan
> Priority: Trivial
> Labels: pull-request-available
>
> the Hadoop documentation states that the number of paths per split in
> {{CombineFileInputFormat}} is not fixed and can vary.
>
> {quote}"If a maxSplitSize is specified, then blocks on the same node are
> combined to form a single split. Blocks that are left over are then combined
> with other blocks in the same rack."
> [hadoop.apache.org|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html?utm_source=chatgpt.com]
> {quote}
> This means that the number of paths in a split is determined by the block
> placement and the configuration settings, leading to potential variations in
> the number of paths per split.
>
> This causes the test to sometimes fail depending on the split. As such, the
> test could be reworked to avoid strictly testing for the number of paths in
> each split.
> h4.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]