[ 
https://issues.apache.org/jira/browse/HADOOP-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HADOOP-19714:
------------------------------------
    Labels: pull-request-available  (was: )

> testCombineFileInputFormat is overly constrained and can sometimes fail
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-19714
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19714
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Paco Chan
>            Priority: Trivial
>              Labels: pull-request-available
>
> the Hadoop documentation states that the number of paths per split in 
> {{CombineFileInputFormat}} is not fixed and can vary. 
>  
> {quote}"If a maxSplitSize is specified, then blocks on the same node are 
> combined to form a single split. Blocks that are left over are then combined 
> with other blocks in the same rack." 
> [hadoop.apache.org|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html?utm_source=chatgpt.com]
> {quote}
> This means that the number of paths in a split is determined by the block 
> placement and the configuration settings, leading to potential variations in 
> the number of paths per split.
>  
> This causes the test to sometimes fail depending on the split. As such, the 
> test could be reworked to avoid strictly testing for the number of paths in 
> each split. 
> h4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to