Re: Slow download of segments from deep storage

Gian Merlino Wed, 30 Jan 2019 16:41:14 -0800

I believe today, if you use the (experimental) HTTP-based load queues, they
will parallelize segment downloads. Adding similar functionality for the
ZK-based load queues would definitely be useful though, since at this time
nobody seems to be actively driving a migration to HTTP-based load queues
being enabled by default.


On Wed, Jan 30, 2019 at 7:20 PM Samarth Jain <sama...@apache.org> wrote:

> We noticed that it takes a long time for the historicals to download
> segments from deep storage (in our case S3). Looking closer at the code in
> ZKCoordinator, I noticed that the segment download is happening in a single
> threaded fashion. This download happens in the SingleThreadedExecutor
> service used by the PathChildrenCache. Looking at the commentary on
> https://github.com/apache/incubator-druid/issues/4421 and
> https://github.com/apache/incubator-druid/issues/3202, the executor
> service
> used in PathChildrenCache can only be single threaded.
>
> My proposal is to use a multi threaded ExecutorService that will be used to
> take action on the  events to perform the download. The role of single
> threaded ExecutorService in PathChildrenCache will be simply to delegate
> the download task to this new executor service.
>
> Does that sound feasible? IMO, if this happens to be functionally correct,
> it should help significantly boost up the time it is taking historicals to
> download all the assigned segments.
>
> I would be more than happy to contribute this enhancement to the community.
>
> Thanks,
> Samarth
>

Re: Slow download of segments from deep storage

Reply via email to