[ 
https://issues.apache.org/jira/browse/FLINK-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318374#comment-15318374
 ] 

Flavio Pompermaier commented on FLINK-3777:
-------------------------------------------

In our use case we have this very complex query that produce about 11 billions 
of records and we did some benchmark in order to determine the perfect size of 
the splits.
That best split size happened to be around 100k (per query), because as you 
stated, there's a trade-off between the complexity on the JobManager side but 
there's also a trade-off on the database server capability to answer to wide 
range of keys. 
Splitting the entire key-set into just a small number of splits causes the job 
to die because the queries never ends (i.e. timeout exceptions).

That was our "painful" experience..

> Add open and close methods to manage IF lifecycle
> -------------------------------------------------
>
>                 Key: FLINK-3777
>                 URL: https://issues.apache.org/jira/browse/FLINK-3777
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.1
>            Reporter: Flavio Pompermaier
>            Assignee: Flavio Pompermaier
>              Labels: inputformat, lifecycle
>
> At the moment the opening and closing of an inputFormat are not managed, 
> although open() could be (improperly IMHO) simulated by configure().
> This limits the possibility to reuse expensive resources (like database 
> connections) and manage their release. 
> Probably the best option would be to add 2 methods (i.e. openInputformat() 
> and closeInputFormat() ) to RichInputFormat*
> * NOTE: the best option from a "semantic" point of view would be to rename 
> the current open() and close() to openSplit() and closeSplit() respectively 
> while using open() and close() methods for the IF lifecycle management, but 
> this would cause a backward compatibility issue...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to