Re: MultiTransactionJobletEventListenerFactory

abdullah alamoudi Fri, 17 Nov 2017 06:52:55 -0800

Can you illustrate how a deadlock can happen? I am anxious to know.
Moreover, the reason for the multiple transaction ids in feeds is not simply 
because we compile them differently.


How would a commit operator know which dataset active operation counter to 
decrement if they share the same id for example?

> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote:
> 
> Yes. That deadlock could happen. Currently, we have one-to-one mappings for
> the jobs and transactions, except for the feeds.
> 
> @Abdullah, after some digging into the code, I think probably we can use a
> single transaction id for the job which feeds multiple datasets? See if I
> can convince you. :)
> 
> The reason we have multiple transaction ids in feeds is that we compile
> each connection job separately and combine them into a single feed job. A
> new transaction id is created and assigned to each connection job, thus for
> the combined job, we have to handle the different transactions as they
> are embedded in the connection job specifications. But, what if we create a
> single transaction id for the combined job? That transaction id will be
> embedded into each connection so they can write logs freely, but the
> transaction will be started and committed only once as there is only one
> feed job. In this way, we won't need multiTransactionJobletEventListener
> and the transaction id can be removed from the job specification easily as
> well (for Steven's change).
> 
> Best,
> Xikui
> 
> 
> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]> wrote:
> 
>> I worry about deadlocks.  The waits for graph may not understand that
>> making t1 wait will also make t2 wait since they may share a thread -
>> right?  Or do we have jobs and transactions separately represented there
>> now?
>> 
>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]> wrote:
>> 
>>> We are using multiple transactions in a single job in case of feed and I
>>> think that this is the correct way.
>>> Having a single job for a feed that feeds into multiple datasets is a
>> good
>>> thing since job resources/feed resources are consolidated.
>>> 
>>> Here are some points:
>>> - We can't use the same transaction id to feed multiple datasets. The
>> only
>>> other option is to have multiple jobs each feeding a different dataset.
>>> - Having multiple jobs (in addition to the extra resources used, memory
>>> and CPU) would then forces us to either read data from external sources
>>> multiple times, parse records multiple times, etc
>>>  or having to have a synchronization between the different jobs and the
>>> feed source within asterixdb. IMO, this is far more complicated than
>> having
>>> multiple transactions within a single job and the cost far outweigh the
>>> benefits.
>>> 
>>> P.S,
>>> We are also using this for bucket connections in Couchbase Analytics.
>>> 
>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> wrote:
>>>> 
>>>> If there are a number of issue with supporting multiple transaction ids
>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>> Also, code that’s not being used has a tendency to "rot" and so I think
>>>> that it’s usefulness might be limited by the time we’d find a use for
>>>> this functionality.
>>>> 
>>>> My 2c,
>>>> Till
>>>> 
>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>> 
>>>>> I'm separating the connections into different jobs in some of my
>>>>> experiments... but that was intended to be used for the experimental
>>>>> settings (i.e., not for master now)...
>>>>> 
>>>>> I think the interesting question here is whether we want to allow one
>>>>> Hyracks job to carry multiple transactions. I personally think that
>>> should
>>>>> be allowed as the transaction and job are two separate concepts, but I
>>>>> couldn't find such use cases other than the feeds. Does anyone have a
>>> good
>>>>> example on this?
>>>>> 
>>>>> Another question is, if we do allow multiple transactions in a single
>>>>> Hyracks job, how do we enable commit runtime to obtain the correct TXN
>>> id
>>>>> without having that embedded as part of the job specification.
>>>>> 
>>>>> Best,
>>>>> Xikui
>>>>> 
>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> I am curious as to how feed will work without this?
>>>>>> 
>>>>>> ~Abdullah.
>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]>
>> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, which
>>>>>> allows
>>>>>>> for one Hyracks job to run multiple Asterix transactions together.
>>>>>>> 
>>>>>>> This class is only used by feeds, and feeds are in process of
>>> changing to
>>>>>>> no longer need this feature. As part of the work in pre-deploying
>> job
>>>>>>> specifications to be used by multiple hyracks jobs, I've been
>> working
>>> on
>>>>>>> removing the transaction id from the job specifications, as we use a
>>> new
>>>>>>> transaction for each invocation of a deployed job.
>>>>>>> 
>>>>>>> There is currently no clear way to remove the transaction id from
>> the
>>> job
>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>> tenerFactory.
>>>>>>> 
>>>>>>> The question for the group is, do we see a need to maintain this
>> class
>>>>>> that
>>>>>>> will no longer be used by any current code? Or, an other words, is
>>> there
>>>>>> a
>>>>>>> strong possibility that in the future we will want multiple
>>> transactions
>>>>>> to
>>>>>>> share a single Hyracks job, meaning that it is worth figuring out
>> how
>>> to
>>>>>>> maintain this class?
>>>>>>> 
>>>>>>> Steven
>>>>>> 
>>>>>> 
>>> 
>>> 
>>

Re: MultiTransactionJobletEventListenerFactory

Reply via email to