[ 
https://issues.apache.org/jira/browse/LIVY-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935852#comment-16935852
 ] 

Marco Gaido commented on LIVY-667:
----------------------------------

[~yihengw] yes, that's true. The point is that the data needs to be transferred 
anyway through JDBC. So having very very large datasets going over the wire may 
not very efficient. Moreover, in case you have a single very big partition, you 
can always repartition it and avoid the issue. My point here is that there may 
be workarounds for the use case and I don't expect this problem to be faced in 
usual use cases. So I feel an overkill to design some workarounds for a corner 
case. It is also doable to do the same which is suggested here manually: ie. 
create a table with the result of a query (this writes on HDFS) and then read 
the table...

> Support query a lot of data.
> ----------------------------
>
>                 Key: LIVY-667
>                 URL: https://issues.apache.org/jira/browse/LIVY-667
>             Project: Livy
>          Issue Type: Bug
>          Components: Thriftserver
>    Affects Versions: 0.6.0
>            Reporter: runzhiwang
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When enable livy.server.thrift.incrementalCollect, thrift use toLocalIterator 
> to load one partition at each time instead of the whole rdd to avoid 
> OutOfMemory. However, if the largest partition is too big, the OutOfMemory 
> still occurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to