[
https://issues.apache.org/jira/browse/NIFI-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027039#comment-17027039
]
Shawn Weeks commented on NIFI-7086:
-----------------------------------
Did you increase your fetch size in the DBCP Connection Pool. The default is
really low in the Oracle JDBC driver and make fetches take forever. Try adding
a property to your connection pool called defaultRowPrefetch and set it to
1000. That should make a huge difference. I think in newer versions of NiFi we
will allow setting the fetch size. Once you've done that you can look into how
to split the data into manageable chunks because fetching 1 billion rows is
going to take forever no matter what you do. The solution I used is a
combination of distributed map cache and date calculations to break the fetches
into 30 day chunks with the idea being you query the database for for ranges
and then uses those ranges to fetch the data in parts. If you get on Slack and
reach out to me I can try and walk you through it i'm usually on there.
> oracle db read is slow (for me its bug)
> ---------------------------------------
>
> Key: NIFI-7086
> URL: https://issues.apache.org/jira/browse/NIFI-7086
> Project: Apache NiFi
> Issue Type: Bug
> Environment: nifi 1.8.0
> Reporter: naveen kumar saharan
> Priority: Critical
>
> I am not able to fetch oracle db for 1 billion record table. It is taking too
> much time(17 hours).
> I tried creating queries based on dates using executesql ->
> generatetablefetch -> executesql to parallel execution
> small tables also performs slow as compared to python database table fetch
> program around 20 times slower. This is very disapppointing.
> querydatabasetable runs only on primary node with , if i increase thread it
> give duplicate data.
> Then what is the use of concurrent thread?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)