Paco Chan created HADOOP-19714:
----------------------------------
Summary: testCombineFileInputFormat is overly constrained and can
sometimes fail
Key: HADOOP-19714
URL: https://issues.apache.org/jira/browse/HADOOP-19714
Project: Hadoop Common
Issue Type: Bug
Reporter: Paco Chan
the Hadoop documentation states that the number of paths per split in
{{CombineFileInputFormat}} is not fixed and can vary.
{quote}"If a maxSplitSize is specified, then blocks on the same node are
combined to form a single split. Blocks that are left over are then combined
with other blocks in the same rack."
[hadoop.apache.org|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html?utm_source=chatgpt.com]
{quote}
This means that the number of paths in a split is determined by the block
placement and the configuration settings, leading to potential variations in
the number of paths per split.
This causes the test to sometimes fail depending on the split. As such, the
test could be reworked to avoid strictly testing for the number of paths in
each split.
h4.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]