[ 
https://issues.apache.org/jira/browse/HIVE-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12050:
------------------------------------
    Description: With directory listing, ETL vs BI decision, local cache, 
metastore cache, with PPD, file footers, and combination thereof (e.g. most 
splits are processed via metastore PPD but some files are not cached and we 
need to make ETL vs BI decision), some of which are blocking and some not, -I 
want to write ORC split generation in Erlang- strategies are no longer the best 
model to schedule all the work. Some messaging or task queue based model might 
be better where each work item that is blocking (dir listing, file read, 
metastore call, etc.) generates list of things, and things are further 
processed until all things are splits and there are no more other things to 
process.  (was: With directory listing, ETL vs BI decision, local cache, 
metastore cache, with PPD, file footers, and combination thereof (e.g. most 
splits are processed via metastore PPD but some files are not cached and we 
need to make ETL vs BI decision), some of which are blocking and some not, -I 
want to write ORC split generation in Erlang- strategies are no longer the best 
model to schedule all the work. Some messaging or task queue based model might 
be better where e.g. directory listers generate file lists that are processed 
by decision making strategy that might go to metastore or cache or files, and 
remaining files might again be processed thru decision making or thru a 
different cache or files, etc.)

> change ORC split generation to use a different model
> ----------------------------------------------------
>
>                 Key: HIVE-12050
>                 URL: https://issues.apache.org/jira/browse/HIVE-12050
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> With directory listing, ETL vs BI decision, local cache, metastore cache, 
> with PPD, file footers, and combination thereof (e.g. most splits are 
> processed via metastore PPD but some files are not cached and we need to make 
> ETL vs BI decision), some of which are blocking and some not, -I want to 
> write ORC split generation in Erlang- strategies are no longer the best model 
> to schedule all the work. Some messaging or task queue based model might be 
> better where each work item that is blocking (dir listing, file read, 
> metastore call, etc.) generates list of things, and things are further 
> processed until all things are splits and there are no more other things to 
> process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to