[ 
https://issues.apache.org/jira/browse/IMPALA-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-5953:
-------------------------------
    Issue Type: Task  (was: Sub-task)
        Parent:     (was: IMPALA-5865)

> Consider moving wholesale to kudu::Thread/ThreadPool
> ----------------------------------------------------
>
>                 Key: IMPALA-5953
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5953
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>            Reporter: Sailesh Mukil
>            Priority: Major
>              Labels: observability, refactor, util
>
> I went over the effort required to switch to kudu::Thread and 
> kudu::ThreadPool completely and I've broken it down into some high level 
> tasks, which are listed below.
> A rough estimate of time needed for this task with one person on it is 1 week 
> without too many distractions.
> +Diffs between kudu::Thread and impala::Thread:+
> * kudu::Thread use pthreads, whereas impala::Thread uses boost::thread().  
> Need to measure for performance differences. Effort: High - for perf 
> evaluation
> * kudu::Thread has no ThreadGroup equivalent. Effort: Easy
> * kudu::Thread needs to use a ThreadJoiner to join threads. Effort: Easy
> * kudu::Thread uses kudu::MetricEntity whereas impala::Thread uses 
> impala::MetricGroup. Effort: Moderate
> * kudu::Thread does not understand impala::Webserver. Needs to be implemented 
> inside Kudu. Effort: Moderate/High
> +Diffs between kudu::ThreadPool impala::ThreadPool:+
> kudu::ThreadPool has very different APIs than impala::ThreadPool
> * kudu::ThreadPool has no CallableThreadPool, but has SubmitFunction() which 
> tries to achieve the same thing. Effort: Moderate
> * kudu::ThreadPool does not have a default worker task that we can assign at 
> object creation time.
> * We either need to convert everything to Submit(Runnable) or SubmitFunc(). 
> Effort: High, and we need to check if this affects performance in any way due 
> to the increased rate of passing around function pointers.
> ---------------
> Below, I list the things we'd need to add to the kudu::Thread code and 
> differentiate between the possibility of having a shared utility repo for 
> Impala and Kudu, versus just periodically updating the kudu utils code inside 
> Impala:
> # Add a ThreadGroup class (Easy, non-invasive)
> # Add a ThreadPool type that takes a worker task on object creation. Only if 
> needed. (Easy, non-invasive)
> # Add a Impala specific instrumentation functions for metrics. (Easy, but 
> slightly invasive)
> # Add a impala Webserver callback for kudu::Thread  (Moderate/High, quite 
> invasive as we need to include an Impala header inside kudu) I already got 
> this working 80% last night.
> An alternative would be to have a function that exports all thread 
> information, and have another callback from the Impala code registered that 
> uses this exported information to display on the webpage. That way, we avoid 
> including Impala headers inside Kudu. But this would involve more work since 
> we'd need to add a struct/class that's understood by both Impala and Kudu.
> Tasks 1 and 2 are only "adding" to the code and not modifying any existing 
> code.
> * +Shared repo:+ If we're sharing a repo, it can just be added to that repo 
> as new APIs.
> * +Periodic update:+ It will be easy to keep moving this to the newer copies 
> since we can pretty much copy paste, unless the kudu::Thread APIs change, 
> which I doubt they will.
> Task 3 is slightly invasive:
> * +Shared repo:+ If we're sharing a repo, then we will most likely need to 
> standardize on other things as well, such as Metrics, in which case, this 
> wouldn't be invasive any more, and we can just use Kudu's current 
> instrumentation code.
> * +Periodic update:+ Even though it's slightly invasive, it wouldn't be hard 
> to keep a track of and move the diffs to the newer copies.
> Task 4:
> * +Shared repo:+ If we're sharing a repo, then we will have to go with the 
> suggestion of exporting thread instrumentation as I suggested above.
> * +Periodic update:+ It wouldn't be hard to keep copying to the newer copies, 
> since the only invasive part is the adding of the Impala header.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to