[
https://issues.apache.org/jira/browse/IMPALA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775405#comment-16775405
]
ASF subversion and git services commented on IMPALA-7019:
---------------------------------------------------------
Commit dce82e4e018d1944ff19bb6f87139b51c1b0287e in impala's branch
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dce82e4 ]
IMPALA-8178: Disable file handle cache for HDFS erasure coded files
Testing on an erasure coded minicluster has revealed that each
file handle for an erasure coded files uses about 3MB of native
memory. This shows up as "java.nio:type=BufferPool,name=direct"
in the /jmx endpoint (here showing the output when 608 handles
are open):
{
"name": "java.nio:type=BufferPool,name=direct",
"modelerType": "sun.management.ManagementFactoryHelper$1",
"Name": "direct",
"TotalCapacity": 1921048960,
"MemoryUsed": 1921048961,
"Count": 633,
"ObjectName": "java.nio:type=BufferPool,name=direct"
}
The memory is not released or reduced by a call to unbuffer(),
so these file handles are not suitable for long term caching.
HDFS-14308 tracks the implementation of unbuffer() for
DFSStripedInputStream. This issue showed up when remote
file handle caching was enabled in IMPALA-7265, as erasure
coded files are always scheduled to be remote (IMPALA-7019).
This disables file handle caching for erasure coded files,
which requires plumbing through the information about which
ScanRanges are accessing erasure coded files.
With this change, core tests pass on an erasure coded system.
Change-Id: I8c761e08aacc952de0033a4c91e07f15c8ec96da
Reviewed-on: http://gerrit.cloudera.org:8080/12552
Reviewed-by: Joe McDonnell <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Discard block locations and schedule as remote read with erasure coding
> -----------------------------------------------------------------------
>
> Key: IMPALA-7019
> URL: https://issues.apache.org/jira/browse/IMPALA-7019
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Tianyi Wang
> Assignee: Tianyi Wang
> Priority: Major
> Fix For: Impala 3.1.0
>
>
> Currently Impala schedules erasure coded scan in the same way as scheduling
> regular HDFS scan: it tries to schedule the scan on a datanode processing the
> block. This makes little sense with erasure coding so we should schedule it
> as if the block is remote.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]