> On March 5, 2014, 6:03 p.m., Purshotam Shah wrote:
> > core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java, line 962
> > <https://reviews.apache.org/r/18762/diff/1/?file=510568#file510568line962>
> >
> >     I see some issue with this. Earlier we start from end and break the for 
> > loop if nominalInstanceCal.compareTo(initInstance).
> >     There are still some instance being return, but not in this case.
> >     
> >     Eg 
> >        <input-events>
> >             <data-in name="input" dataset="raw-logs">
> >                 <start-instance>${coord:current(-200)}</start-instance>
> >                 <end-instance>${coord:current(0)}</end-instance>
> >             </data-in>
> >         </input-events>
> >     
> >     Where coord:current(-10) < initInstance.
> >     
> >     Earlier it will return -9 to 0, but now none.
> >     
> >     
> >
> 
> shwethags wrote:
>     Actually, instead of returning partial instances and user wondering what 
> went wrong with this coord action, we should add this check at coord submit.  
> Coord submit should fail if the instances span before/after dataset time 
> interval
> 
> Purshotam Shah wrote:
>     No. User cloud be using this. User might want to process full data + 
> increment data every time.
>     
>     They will set start-instance as coord:current(-1000000) (-very high 
> number), and end-instance as current(0), so that they can process full feed + 
> incremental feed.
>     
>     This cloud be one way to process all data from start date to current(0). 
>

Better thing to do in this case will be to expose an EL function that returns 
the first instance. Till that feature is available, we should support the 
usecase that you mentioned. I will update tha patch accordingly.


> On March 5, 2014, 6:03 p.m., Purshotam Shah wrote:
> > core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java, line 961
> > <https://reviews.apache.org/r/18762/diff/1/?file=510568#file510568line961>
> >
> >     This can be moved outside from for loop. since you are starting from 
> > start and it's increasing.
> 
> shwethags wrote:
>     I will add check in coord submit. So, this check won't be required anymore

This check can't be moved outside for loop for cases when start count is -ve. 
This can evaluate to an instance before the dataset start


- shwethags


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18762/#review36237
-----------------------------------------------------------


On March 5, 2014, 5:28 a.m., shwethags wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18762/
> -----------------------------------------------------------
> 
> (Updated March 5, 2014, 5:28 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-1709
>     https://issues.apache.org/jira/browse/OOZIE-1709
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> CoordELFunctions.getCurrentInstance() has this code:
>     while (current.compareTo(calEffectiveTime) <= 0) {
>             current = (Calendar) origCurrent.clone();
>             instanceCount[0]++;
>             current.add(dsTimeUnit.getCalendarUnit(), instanceCount[0] * 
> dsFreq);
>         }
> 
> For coords with smaller frequency and start time in very past, this is very 
> expensive. On prod, we have seen materialisation of each instance taking few 
> mins sometimes for coords with 1 min frequency
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java d73bc7d 
>   core/src/test/java/org/apache/oozie/coord/TestCoordELFunctions.java 34a428f 
> 
> Diff: https://reviews.apache.org/r/18762/diff/
> 
> 
> Testing
> -------
> 
> UT
> 
> 
> Thanks,
> 
> shwethags
> 
>

Reply via email to