Re: MultiTransactionJobletEventListenerFactory

Murtadha Hubail Fri, 17 Nov 2017 07:12:21 -0800

For atomic transactions, the change was merged yesterday. For entity level 
transactions, it should be a very small change.


Cheers,
Murtadha

> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <[email protected]> wrote:
> 
> I understand that is not the case right now but what you're working on?
> 
> Cheers,
> Abdullah.
> 
> 
>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <[email protected]> wrote:
>> 
>> A transaction context can register multiple primary indexes.
>> Since each entity commit log contains the dataset id, you can decrement the 
>> active operations on 
>> the operation tracker associated with that dataset id.
>> 
>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <[email protected]> wrote:
>> 
>>   Can you illustrate how a deadlock can happen? I am anxious to know.
>>   Moreover, the reason for the multiple transaction ids in feeds is not 
>> simply because we compile them differently.
>> 
>>   How would a commit operator know which dataset active operation counter to 
>> decrement if they share the same id for example?
>> 
>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote:
>>> 
>>> Yes. That deadlock could happen. Currently, we have one-to-one mappings for
>>> the jobs and transactions, except for the feeds.
>>> 
>>> @Abdullah, after some digging into the code, I think probably we can use a
>>> single transaction id for the job which feeds multiple datasets? See if I
>>> can convince you. :)
>>> 
>>> The reason we have multiple transaction ids in feeds is that we compile
>>> each connection job separately and combine them into a single feed job. A
>>> new transaction id is created and assigned to each connection job, thus for
>>> the combined job, we have to handle the different transactions as they
>>> are embedded in the connection job specifications. But, what if we create a
>>> single transaction id for the combined job? That transaction id will be
>>> embedded into each connection so they can write logs freely, but the
>>> transaction will be started and committed only once as there is only one
>>> feed job. In this way, we won't need multiTransactionJobletEventListener
>>> and the transaction id can be removed from the job specification easily as
>>> well (for Steven's change).
>>> 
>>> Best,
>>> Xikui
>>> 
>>> 
>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]> wrote:
>>>> 
>>>> I worry about deadlocks.  The waits for graph may not understand that
>>>> making t1 wait will also make t2 wait since they may share a thread -
>>>> right?  Or do we have jobs and transactions separately represented there
>>>> now?
>>>> 
>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]> wrote:
>>>>> 
>>>>> We are using multiple transactions in a single job in case of feed and I
>>>>> think that this is the correct way.
>>>>> Having a single job for a feed that feeds into multiple datasets is a
>>>> good
>>>>> thing since job resources/feed resources are consolidated.
>>>>> 
>>>>> Here are some points:
>>>>> - We can't use the same transaction id to feed multiple datasets. The
>>>> only
>>>>> other option is to have multiple jobs each feeding a different dataset.
>>>>> - Having multiple jobs (in addition to the extra resources used, memory
>>>>> and CPU) would then forces us to either read data from external sources
>>>>> multiple times, parse records multiple times, etc
>>>>> or having to have a synchronization between the different jobs and the
>>>>> feed source within asterixdb. IMO, this is far more complicated than
>>>> having
>>>>> multiple transactions within a single job and the cost far outweigh the
>>>>> benefits.
>>>>> 
>>>>> P.S,
>>>>> We are also using this for bucket connections in Couchbase Analytics.
>>>>> 
>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> wrote:
>>>>>> 
>>>>>> If there are a number of issue with supporting multiple transaction ids
>>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>>>> Also, code that’s not being used has a tendency to "rot" and so I think
>>>>>> that it’s usefulness might be limited by the time we’d find a use for
>>>>>> this functionality.
>>>>>> 
>>>>>> My 2c,
>>>>>> Till
>>>>>> 
>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>> 
>>>>>>> I'm separating the connections into different jobs in some of my
>>>>>>> experiments... but that was intended to be used for the experimental
>>>>>>> settings (i.e., not for master now)...
>>>>>>> 
>>>>>>> I think the interesting question here is whether we want to allow one
>>>>>>> Hyracks job to carry multiple transactions. I personally think that
>>>>> should
>>>>>>> be allowed as the transaction and job are two separate concepts, but I
>>>>>>> couldn't find such use cases other than the feeds. Does anyone have a
>>>>> good
>>>>>>> example on this?
>>>>>>> 
>>>>>>> Another question is, if we do allow multiple transactions in a single
>>>>>>> Hyracks job, how do we enable commit runtime to obtain the correct TXN
>>>>> id
>>>>>>> without having that embedded as part of the job specification.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Xikui
>>>>>>> 
>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I am curious as to how feed will work without this?
>>>>>>>> 
>>>>>>>> ~Abdullah.
>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, which
>>>>>>>> allows
>>>>>>>>> for one Hyracks job to run multiple Asterix transactions together.
>>>>>>>>> 
>>>>>>>>> This class is only used by feeds, and feeds are in process of
>>>>> changing to
>>>>>>>>> no longer need this feature. As part of the work in pre-deploying
>>>> job
>>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
>>>> working
>>>>> on
>>>>>>>>> removing the transaction id from the job specifications, as we use a
>>>>> new
>>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>>> 
>>>>>>>>> There is currently no clear way to remove the transaction id from
>>>> the
>>>>> job
>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>>>> tenerFactory.
>>>>>>>>> 
>>>>>>>>> The question for the group is, do we see a need to maintain this
>>>> class
>>>>>>>> that
>>>>>>>>> will no longer be used by any current code? Or, an other words, is
>>>>> there
>>>>>>>> a
>>>>>>>>> strong possibility that in the future we will want multiple
>>>>> transactions
>>>>>>>> to
>>>>>>>>> share a single Hyracks job, meaning that it is worth figuring out
>>>> how
>>>>> to
>>>>>>>>> maintain this class?
>>>>>>>>> 
>>>>>>>>> Steven
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 
>> 
>> 
>

Re: MultiTransactionJobletEventListenerFactory

Reply via email to