[ 
https://issues.apache.org/jira/browse/HADOOP-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024387#comment-18024387
 ] 

ASF GitHub Bot commented on HADOOP-19714:
-----------------------------------------

18chanp1 opened a new pull request, #8010:
URL: https://github.com/apache/hadoop/pull/8010

   ### Description of PR
   The Hadoop documentation states that the number of paths per split in 
CombineFileInputFormat is not fixed and can vary. 
   
       "If a maxSplitSize is specified, then blocks on the same node are 
combined to form a single split. Blocks that are left over are then combined 
with other blocks in the same rack." 
[hadoop.apache.org](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html?utm_source=chatgpt.com)
   
   This means that the number of paths in a split is determined by the block 
placement and the configuration settings, leading to potential variations in 
the number of paths per split.
   
   This causes the test to sometimes fail depending on the split. As such, the 
test could be reworked to avoid strictly testing for the number of paths in 
each split.
   
   This patch relaxes the assumptions by removing checks on number of paths, 
focusing on whether the files exist
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [X] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [X] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [X ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> testCombineFileInputFormat is overly constrained and can sometimes fail
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-19714
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19714
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Paco Chan
>            Priority: Trivial
>
> the Hadoop documentation states that the number of paths per split in 
> {{CombineFileInputFormat}} is not fixed and can vary. 
>  
> {quote}"If a maxSplitSize is specified, then blocks on the same node are 
> combined to form a single split. Blocks that are left over are then combined 
> with other blocks in the same rack." 
> [hadoop.apache.org|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html?utm_source=chatgpt.com]
> {quote}
> This means that the number of paths in a split is determined by the block 
> placement and the configuration settings, leading to potential variations in 
> the number of paths per split.
>  
> This causes the test to sometimes fail depending on the split. As such, the 
> test could be reworked to avoid strictly testing for the number of paths in 
> each split. 
> h4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to