[ 
https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715831#comment-14715831
 ] 

Jason Lowe commented on TEZ-2628:
---------------------------------

I believe this is a bug in the MemoryTimelineStore.  The patch loads the data 
into the timeline store the same way each time, since it reads the data 
serially from the HDFS file each time it refreshes the cache.  And if it hits 
in the cache we don't load anything at all.

The problem appears to be with this code:
{noformat}
    Iterator<TimelineEntity> entityIterator = null;
    if (fromId != null) {
      TimelineEntity firstEntity = entities.get(new EntityIdentifier(fromId,
          entityType));
      if (firstEntity == null) {
        return new TimelineEntities();
      } else {
        entityIterator = new TreeSet<TimelineEntity>(entities.values())
            .tailSet(firstEntity, true).iterator();
      }
    }
    if (entityIterator == null) {
      entityIterator = new PriorityQueue<TimelineEntity>(entities.values())
          .iterator();
    }
{noformat}

Note how it builds a completely different iterator based on whether fromId was 
specified or not.  As a quick sanity check I changed it to something like this:
{noformat}
    Iterator<TimelineEntity> entityIterator = null;
    if (fromId != null) {
      TimelineEntity firstEntity = entities.get(new EntityIdentifier(fromId,
          entityType));
      if (firstEntity == null) {
        return new TimelineEntities();
      } else {
        entityIterator = new TreeSet<TimelineEntity>(entities.values())
            .tailSet(firstEntity, true).iterator();
      }
    }
    if (entityIterator == null) {
      entityIterator = new TreeSet<TimelineEntity>(entities.values())
          .iterator();
    }
{noformat}

and the discrepancy between queries with and without the fromId disappeared in 
my experiments.  This mishandling by the MemoryTimelineStore is something we 
could fix in YARN-3942 or as a separate JIRA.

> History logging plugin to write ATS events to HDFS
> --------------------------------------------------
>
>                 Key: TEZ-2628
>                 URL: https://issues.apache.org/jira/browse/TEZ-2628
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: TEZ-2628.001.patch, TEZ-2628.002.patch, 
> hive-timeline.json
>
>
> This provides another history logging alternative that conceptually the same 
> as the timeline logging service but logs the entities to a file rather than 
> posting the events to the timeline server directly.  When coupled with the 
> timeline store plugin from YARN-3942 it allows the Tez job to be decoupled 
> from the timeline server yet the Tez UI can still function properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to