This sounds like hotspotting. Ideally the workload over the keyspace can be
better distributed, which is another avenue of attack - partitioning, keying
> On Oct 13, 2016, at 6:10 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
> The DAG for a template just happens to schedule 2 tasks that do something
> like this:
> val fieldsRDD: RDD[(ItemID, PropertyMap)] = PEventStore.aggregateProperties(
> appName = dsp.appName,
> entityType = "item")(sc)
> to execute in parallel
> The PEventStore calls from 2 separate closures start hitting HBase and it
> fails, no matter how high I set the RPC and Scanner Timeout.
> This has only come up recently with some restructuring, which I assume caused
> the 2 tasks to end up at the same point in the DAG. Is there a way to force
> one HBase related task to complete before the other is started? They both
> return RDDs, which are lazy evaluated like promises until the data is needed.
> Can I force the promise to be kept?