[
https://issues.apache.org/jira/browse/HIVE-24717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277535#comment-17277535
]
Mustafa İman commented on HIVE-24717:
-------------------------------------
This caused so many precommit tests to fail. Why is below
# Majority of tests run on local file system which is a ChecksumFileSystem.
# ChecksumFileSystem#listStatusIterator() does not ignore checksum files
(.A.crc files)
# listStatusIterator's next returns A first, .A.crc second.
# .A.crc files are automatically deleted by ChecksumFileSystem when file A is
deleted.
# Since we delete file A before iterator processes .A.crc, iterator cannot
find .A.crc when it checks its permissions.
# On Linux and Macos, LocalFileSystem invokes "ls -ld .A.crc" to get the
permissions.
# This exits with code 2 on Linux and iterator just throws.
Interestingly, the tests run fine on my computer. The reason is I use a Mac.
"ls -ld" returns 1 for the same error on Macos. Hadoop ignores exit code 1 but
not 2. So the tests pass on Mac but fail on Linux.
Apparently this is fixed on Hadoop 3.2 via
https://issues.apache.org/jira/browse/HADOOP-12502 . We are using Hadoop 3.1.0.
We need to either upgrade to Hadoop 3.2 or get the fix backported to Hadoop 3.1
line.
> Migrate to listStatusIterator in moving files
> ---------------------------------------------
>
> Key: HIVE-24717
> URL: https://issues.apache.org/jira/browse/HIVE-24717
> Project: Hive
> Issue Type: Improvement
> Reporter: Mustafa İman
> Assignee: Mustafa İman
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Hive.java has various calls to hdfs listStatus call when moving
> files/directories around. These codepaths are used for insert overwrite
> table/partition queries.
> listStatus It is blocking call whereas listStatusIterator is backed by a
> RemoteIterator and fetches pages in the background. Hive should take
> advantage of that since Hadoop has implemented listStatusIterator for S3
> recently https://issues.apache.org/jira/browse/HADOOP-17074
--
This message was sent by Atlassian Jira
(v8.3.4#803005)