This sounds like hotspotting. Ideally the workload over the keyspace can be 
better distributed, which is another avenue of attack - partitioning, keying 

> On Oct 13, 2016, at 6:10 PM, Pat Ferrel <> wrote:
> The DAG for a template just happens to schedule 2 tasks that do something 
> like this:
> val fieldsRDD: RDD[(ItemID, PropertyMap)] = PEventStore.aggregateProperties(
>  appName = dsp.appName,
>  entityType = "item")(sc)
> to execute in parallel
> The PEventStore calls from 2 separate closures start hitting HBase and it 
> fails, no matter how high I set the RPC and Scanner Timeout. 
> This has only come up recently with some restructuring, which I assume caused 
> the 2 tasks to end up at the same point in the DAG. Is there a way to force 
> one HBase related task to complete before the other is started? They both 
> return RDDs, which are lazy evaluated like promises until the data is needed. 
> Can I force the promise to be kept?

Reply via email to