Thanks Michael, much appreciated!
Nothing should be held in memory for a query like this (other than a single
count per partition), so I don't think that is the problem. There is
likely an error buried somewhere.
For your above comments - I don't get any error but just get the NULL as
return
I am trying to access a mid-size Teradata table (~100 million rows) via
JDBC in standalone mode on a single node (local[*]). When I tried with BIG
table (5B records) then no results returned upon completion of query.
I am using Spark 1.4.1. and is setup on a very powerful machine(2 cpu, 24
cores,
Much appreciated! I am not comparing with select count(*) for
performance, but it was one simple thing I tried to check the performance
:). I think it now makes sense since Spark tries to extract all records
before doing the count. I thought having an aggregated function query
submitted over