Re: Parallel feed ingestion

abdullah alamoudi Tue, 23 May 2017 18:29:10 -0700

Sorry for not replying sooner. I am catching up with the email overload.

Xikui is right. the socket adapter would be a poor choice to saturate the 
nodes. I would think something like the firehose would be best to do that and 
you can run that on multiple NCs.
Another option would be to use the filesystem feed or update the socket adapter 
to accept and process multiple connections (I would go with this since it is 
the most interesting one and you will create a very useful adapter).
multiple feeds running concurrently should work too.


Cheers,
~Abdullah.

> On May 17, 2017, at 1:25 PM, Xikui Wang <[email protected]> wrote:
> 
> Hi,
> 
> Firstly, 3) won't work well as the socket server inside of AsterixDB takes
> connection
> from client side one at a time. The thing you will observe while having two
> clients sending
> data to one socket simultaneously is, the 1st client will go through and
> the 2nd will be
> blocked after several hundreds records. This will continue until the 1st
> one finishes.
> 
> The comparison between 1) and 2) is interesting. (@Abdullah please correct
> me if I'm wrong.)
> IMO, 1) achieves parallelism at the operator level by having intake
> operator
> running on designated nodes simultaneously. 2) achieves that at job level
> by simply
> putting up several jobs which run independently. I think 1) may have less
> overhead
> compared to 2), since part of the workflow that can be shared is duplicated
> multiple times in 2).
> It would be useful to see how these two performs in saturated conditions.
> 
> Best,
> Xikui
> 
> On Wed, May 17, 2017 at 12:11 PM, Mike Carey <[email protected]> wrote:
> 
>> @Xikui?  @Abdullah?
>> 
>> 
>> 
>> On 5/17/17 11:40 AM, Ildar Absalyamov wrote:
>> 
>>> In light of Steven’s discussion about feeds in parallel thread I was
>>> wondering what would be a correct way to push parallel ingestion as far as
>>> possible in multinode\multipartition environment.
>>> In one of my experiments I am trying to saturate the ingestion to see the
>>> effect of computing stats in background.
>>> Several things I’ve tried:
>>> 1) Open a socket adapter on all NC:
>>> create feed Feed using socket_adapter
>>> (
>>>     ("sockets”="NC1:10001,NC2:10001,…”),
>>> …)
>>> 
>>> 2) Connect several Feeds to a single dataset.
>>> create feed Feed1 using socket_adapter
>>> (
>>>     ("sockets”="NC1:10001”),
>>> …)
>>> create feed Feed2 using socket_adapter
>>> (
>>>     ("sockets”="NC2:10001”),
>>> …)
>>> 
>>> 3) Have several nodes sending data into a single socket.
>>> 
>>> In my previous experiments the parallelization did not quite show that
>>> the bottleneck was on the sender side, but I am wondering if that will
>>> still be the case, since a lot of things happened under the hood since the
>>> last time.
>>> 
>>> Best regards,
>>> Ildar
>>> 
>> 
>>

Re: Parallel feed ingestion

Reply via email to