*Hi, community, I have setup 3 nodes spark cluster using standalone mode,
each machine's memery is 16G, the core is 4. *
*when i run " val file =
sc.textFile("/user/hive/warehouse/b/test.txt")
file.filter(line => line.contains("2013-")).count() "*
*it cost 2.7s , *
*but , when i run "select count(*) from b;" using shark, it cost 15.81s, *
*So,Why shark using more time than spark? *
*other info:*
*1. i have set export SPARK_MEM=10g in shark-env.sh2. *
*test.txt is 4.21G which exists on each machine's directory
/user/hive/warehouse/b/ and *
*test.txt has been loaded into memery.*
*3. there are 38532979 lines in test.txt*