Re: MultiTransactionJobletEventListenerFactory

Mike Carey Fri, 17 Nov 2017 00:16:20 -0800

This makes good sense to me! (But I'm not sufficiently expert on thecode to know for sure; I just know that danger seems to lurk in deadlockland if the detection model doesn't have enough of an understanding ofwho the actors are and what blocking might do. It may be that ourtransactor notion has this case covered too - but - I'd be a littlesurprised if it does.)


On 11/16/17 9:46 PM, Xikui Wang wrote:

Yes. That deadlock could happen. Currently, we have one-to-one mappings for
the jobs and transactions, except for the feeds.

@Abdullah, after some digging into the code, I think probably we can use a
single transaction id for the job which feeds multiple datasets? See if I
can convince you. :)

The reason we have multiple transaction ids in feeds is that we compile
each connection job separately and combine them into a single feed job. A
new transaction id is created and assigned to each connection job, thus for
the combined job, we have to handle the different transactions as they
are embedded in the connection job specifications. But, what if we create a
single transaction id for the combined job? That transaction id will be
embedded into each connection so they can write logs freely, but the
transaction will be started and committed only once as there is only one
feed job. In this way, we won't need multiTransactionJobletEventListener
and the transaction id can be removed from the job specification easily as
well (for Steven's change).

Best,
Xikui


On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]> wrote:

I worry about deadlocks.  The waits for graph may not understand that
making t1 wait will also make t2 wait since they may share a thread -
right?  Or do we have jobs and transactions separately represented there
now?

On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]> wrote:

We are using multiple transactions in a single job in case of feed and I
think that this is the correct way.
Having a single job for a feed that feeds into multiple datasets is a

good

thing since job resources/feed resources are consolidated.

Here are some points:
- We can't use the same transaction id to feed multiple datasets. The

only

other option is to have multiple jobs each feeding a different dataset.
- Having multiple jobs (in addition to the extra resources used, memory
and CPU) would then forces us to either read data from external sources
multiple times, parse records multiple times, etc
   or having to have a synchronization between the different jobs and the
feed source within asterixdb. IMO, this is far more complicated than

having

multiple transactions within a single job and the cost far outweigh the
benefits.

P.S,
We are also using this for bucket connections in Couchbase Analytics.

On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> wrote:

If there are a number of issue with supporting multiple transaction ids
and no clear benefits/use-cases, I’d vote for simplification :)
Also, code that’s not being used has a tendency to "rot" and so I think
that it’s usefulness might be limited by the time we’d find a use for
this functionality.

My 2c,
Till

On 16 Nov 2017, at 13:57, Xikui Wang wrote:

I'm separating the connections into different jobs in some of my
experiments... but that was intended to be used for the experimental
settings (i.e., not for master now)...

I think the interesting question here is whether we want to allow one
Hyracks job to carry multiple transactions. I personally think that

should

be allowed as the transaction and job are two separate concepts, but I
couldn't find such use cases other than the feeds. Does anyone have a

good

example on this?

Another question is, if we do allow multiple transactions in a single
Hyracks job, how do we enable commit runtime to obtain the correct TXN

id

without having that embedded as part of the job specification.

Best,
Xikui

On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <

[email protected]>

wrote:

I am curious as to how feed will work without this?

~Abdullah.

On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]>

wrote:

Hi all,
We currently have MultiTransactionJobletEventListenerFactory, which

allows

for one Hyracks job to run multiple Asterix transactions together.

This class is only used by feeds, and feeds are in process of

changing to

no longer need this feature. As part of the work in pre-deploying

job

specifications to be used by multiple hyracks jobs, I've been

working

on

removing the transaction id from the job specifications, as we use a

new

transaction for each invocation of a deployed job.

There is currently no clear way to remove the transaction id from

the

job

spec and keep the option for MultiTransactionJobletEventLis

tenerFactory.

The question for the group is, do we see a need to maintain this

class

that

will no longer be used by any current code? Or, an other words, is

there

strong possibility that in the future we will want multiple

transactions

to

share a single Hyracks job, meaning that it is worth figuring out

how

to

maintain this class?

Steven

Re: MultiTransactionJobletEventListenerFactory

Reply via email to