I will see if it works with specifying the join condition. On Wed, Apr 27, 2016 at 2:28 PM Tim Armstrong <[email protected]> wrote:
> You need to specify a join condition. I.e. table_a join table_b on > table_a.key = table_2.key > > table_a join table_b computes the cross-product of the two tables. > > You can confirm by running explain: > > explain select count(*) from tab3_test_new join tab4_test_new > > > It will show a plan like this with a "CROSS JOIN" in it. > > [tarmstrong-box.ca.cloudera.com:21000] > explain select * from > functional.alltypes join functional.alltypes t2; > Query: explain select * from functional.alltypes join functional.alltypes > t2 > +-----------------------------------------------------------+ > | Explain String | > +-----------------------------------------------------------+ > | Estimated Per-Host Requirements: Memory=320.68MB VCores=2 | > | | > | 04:EXCHANGE [UNPARTITIONED] | > | | | > | 02:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] | > | | | > | |--03:EXCHANGE [BROADCAST] | > | | | | > | | 01:SCAN HDFS [functional.alltypes t2] | > | | partitions=24/24 files=24 size=478.45KB | > | | | > | 00:SCAN HDFS [functional.alltypes] | > | partitions=24/24 files=24 size=478.45KB | > +-----------------------------------------------------------+ > > > On Wed, Apr 27, 2016 at 2:21 PM, Pradeep Nayak <[email protected]> > wrote: > >> 40 CPU and 156GB RAM per nodemanager. >> We are trying to do some validation on Impala. We are using a 5 node >> hadoop >> cluster m4.10x large machines on the aws. These were setup with cloudera >> manager. Each node: 40 cpus' and 156GB ram. >> >> We are trying to do a count(*) on two tables with 14 million rows each. >> When do a count(*) individually on each of these nodes, then it works >> fine. >> However we try to do a join on these two tables and do a count(*) we hit >> the problem below saying memory limit exceeded. >> >> Any inputs here are appreciated ? Should we need a bigger cluster for this >> ? How do we decide the size of the cluster. >> >> >> ============ >> >> [ip-172-30-1-57.ec2.internal:21000] > select count(*) from tab3_test_new; >> >> Query: select count(*) from tab3_test_new >> >> +----------+ >> >> | count(*) | >> >> +----------+ >> >> | 14216336 | >> >> +----------+ >> >> Fetched 1 row(s) in 8.29s >> >> [ip-172-30-1-57.ec2.internal:21000] > select count(*) from tab4_test_new; >> >> Query: select count(*) from tab4_test_new >> >> +----------+ >> >> | count(*) | >> >> +----------+ >> >> | 14987634 | >> >> +----------+ >> >> Fetched 1 row(s) in 11.46s >> >> [ip-172-30-1-57.ec2.internal:21000] > >> >> ip-172-30-1-57.ec2.internal:21000] > set mem_limit=64g; >> >> MEM_LIMIT set to 64g >> >> [ip-172-30-1-57.ec2.internal:21000] > select count(*) from tab3_test_new >> join tab4_test_new; >> >> Query: select count(*) from tab3_test_new join tab4_test_new >> >> WARNINGS: >> >> Memory Limit Exceeded >> >> Query(b44dde4886ccafd2:cfb6b9ffcdc4e3a6) Limit: Limit=64.00 GB >> Consumption=80.77 MB >> >> Fragment b44dde4886ccafd2:cfb6b9ffcdc4e3a7: Consumption=12.00 KB >> >> AGGREGATION_NODE (id=6): Consumption=4.00 KB >> >> EXCHANGE_NODE (id=5): Consumption=0 >> >> DataStreamRecvr: Consumption=0 >> >> Block Manager: Limit=156.00 MB Consumption=0 >> >> Fragment b44dde4886ccafd2:cfb6b9ffcdc4e3a8: Consumption=64.62 MB >> >> AGGREGATION_NODE (id=3): Consumption=4.00 KB >> >> NESTED_LOOP_JOIN_NODE (id=2): Consumption=64.06 MB >> >> HDFS_SCAN_NODE (id=0): Consumption=0 >> >> EXCHANGE_NODE (id=4): Consumption=0 >> >> DataStreamRecvr: Consumption=544.00 KB >> >> DataStreamSender: Consumption=16.00 KB >> >> Fragment b44dde4886ccafd2:cfb6b9ffcdc4e3a9: Consumption=16.14 MB >> >> HDFS_SCAN_NODE (id=1): Consumption=16.01 MB >> >> DataStreamSender: Consumption=128.00 KB >> >> WARNING: The following tables are missing relevant table and/or column >> statistics. >> >> test_db.tab3_test_new,test_db.tab4_test_new >> >> >> >> [ip-172-30-1-57.ec2.internal:21000] > >> > > -- > You received this message because you are subscribed to the Google Groups > "Impala Dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. >
