[
https://issues.apache.org/jira/browse/TEZ-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923457#comment-15923457
]
Siddharth Seth commented on TEZ-1526:
-------------------------------------
I believe the initial cache was created to avoid multiple instances of the same
id, when accessed from different locations. The same continues with this
implementation from what I can tell.
One question
bq. There are jobs that were impossible to run before that are now possible.
Why is it possible to run much larger jobs now? Won't TaskAttempt hold a
reference to TaskAttemptId - which makes it ineligible for GarbageCollection
while a job is running?
Other than this, looks good to me.
> LoadingCache for TezTaskID slow for large jobs
> ----------------------------------------------
>
> Key: TEZ-1526
> URL: https://issues.apache.org/jira/browse/TEZ-1526
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Labels: performance
> Attachments: 100000-TezTaskIDs.patch, hamlet.txt, TEZ-1526.3.patch,
> TEZ-1526.4.patch, TEZ-1526.5.patch, TEZ-1526.6.patch, TEZ-1526.7.patch,
> TEZ-1526.8.patch, TEZ-1526.memory.test.patch, TEZ-1526-v1.patch,
> TEZ-1526-v2.patch
>
>
> Using the LoadingCache with default builder settings. 100,000 TezTaskIDs are
> created in 10 seconds on my setup. With a LoadingCache initialCapacity of
> 10,000 they are created in 300 ms. With no LoadingCache, they are created in
> 10 ms. A test case in attached to illustrate the condition I would like to be
> sped up.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)