Hi Everyone, First, I setup the ES server on AWS by the following instructions and the tcp port 9200, 9300 is allowed on security group:
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.5.2.deb sudo dpkg -i elasticsearch-1.5.2.deb And by using sudo netstat -atnp to make sure the above ports are listening: tcp6 0 0 :::9200 :::* LISTEN tcp6 0 0 :::9300 :::* LISTEN Then, my scala code: val sparkConf = new SparkConf().setAppName("Test") .setMaster("local[2]") .set("es.nodes", "52.68.202.80:9200") .set("es.nodes.discovery", "false") val sc = new SparkContext(sparkConf) // total 267 hits val query = "{\"query\":{\"bool\":{\"must\":[{\"range\":{\"scan_time\":{\"from\":\"2014-09-01T00:00:00\",\"to\":\"2014-09-01T00:00:59\"}}}]}}}"; val data = sc.esRDD("wifi-collection/final_data", query) data.collection().foreach(println) It's weird that when I run the code on laptop (localhost), I always got the following message: 15/05/14 00:23:36 INFO SparkContext: Running Spark version 1.3.1 15/05/14 00:23:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/14 00:23:37 INFO SecurityManager: Changing view acls to: jeremy 15/05/14 00:23:37 INFO SecurityManager: Changing modify acls to: jeremy 15/05/14 00:23:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jeremy); users with modify permissions: Set(jeremy) 15/05/14 00:23:37 INFO Slf4jLogger: Slf4jLogger started 15/05/14 00:23:37 INFO Remoting: Starting remoting 15/05/14 00:23:37 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@orion:58665] 15/05/14 00:23:37 INFO Utils: Successfully started service 'sparkDriver' on port 58665. 15/05/14 00:23:37 INFO SparkEnv: Registering MapOutputTracker 15/05/14 00:23:37 INFO SparkEnv: Registering BlockManagerMaster 15/05/14 00:23:37 INFO DiskBlockManager: Created local directory at /var/folders/lz/bc5hqqsn1gvg2hl4b8svwd_w0000gn/T/spark-4d9ee290-78d6-4537-8975-33886ece0b86/blockmgr-469a0ac4-e62d-4e55-b848-c3ddc5bf121f 15/05/14 00:23:37 INFO MemoryStore: MemoryStore started with capacity 1966.1 MB 15/05/14 00:23:37 INFO HttpFileServer: HTTP File server directory is /var/folders/lz/bc5hqqsn1gvg2hl4b8svwd_w0000gn/T/spark-81548d67-dcb2-4e79-b382-79a2a0a32a76/httpd-e77bb5ee-c571-4a0b-bdab-8e8037a3205e 15/05/14 00:23:37 INFO HttpServer: Starting HTTP Server 15/05/14 00:23:37 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/14 00:23:37 INFO AbstractConnector: Started SocketConnector@0.0.0.0:58666 15/05/14 00:23:37 INFO Utils: Successfully started service 'HTTP file server' on port 58666. 15/05/14 00:23:37 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/14 00:23:37 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/14 00:23:37 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/14 00:23:37 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/14 00:23:37 INFO SparkUI: Started SparkUI at http://orion:4040 15/05/14 00:23:38 INFO Executor: Starting executor ID <driver> on host localhost 15/05/14 00:23:38 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@orion:58665/user/HeartbeatReceiver 15/05/14 00:23:38 INFO NettyBlockTransferService: Server created on 58667 15/05/14 00:23:38 INFO BlockManagerMaster: Trying to register BlockManager 15/05/14 00:23:38 INFO BlockManagerMasterActor: Registering block manager localhost:58667 with 1966.1 MB RAM, BlockManagerId(<driver>, localhost, 58667) 15/05/14 00:23:38 INFO BlockManagerMaster: Registered BlockManager 15/05/14 00:23:39 INFO Version: Elasticsearch Hadoop v2.1.0.Beta4 [2c62e273d2] 15/05/14 00:23:39 INFO ScalaEsRDD: Reading from [wifi-collection/final_data] 15/05/14 00:23:39 INFO ScalaEsRDD: Discovered mapping {wifi-collection=[mappings=[final_data=[bssid=STRING, gps_lat=DOUBLE, gps_lng=DOUBLE, imei=STRING, m_lat=DOUBLE, m_lng=DOUBLE, net_lat=DOUBLE, net_lng=DOUBLE, no=LONG, rss=DOUBLE, s_no=LONG, scan_time=DATE, source=STRING, ssid=STRING, trace=LONG]]]} for [wifi-collection/final_data] 15/05/14 00:23:39 INFO SparkContext: Starting job: collect at App.scala:19 15/05/14 00:23:39 INFO DAGScheduler: Got job 0 (collect at App.scala:19) with 5 output partitions (allowLocal=false) 15/05/14 00:23:39 INFO DAGScheduler: Final stage: Stage 0(collect at App.scala:19) 15/05/14 00:23:39 INFO DAGScheduler: Parents of final stage: List() 15/05/14 00:23:39 INFO DAGScheduler: Missing parents: List() 15/05/14 00:23:39 INFO DAGScheduler: Submitting Stage 0 (ScalaEsRDD[0] at RDD at AbstractEsRDD.scala:17), which has no missing parents 15/05/14 00:23:39 INFO MemoryStore: ensureFreeSpace(1496) called with curMem=0, maxMem=2061647216 15/05/14 00:23:39 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1496.0 B, free 1966.1 MB) 15/05/14 00:23:39 INFO MemoryStore: ensureFreeSpace(1148) called with curMem=1496, maxMem=2061647216 15/05/14 00:23:39 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1148.0 B, free 1966.1 MB) 15/05/14 00:23:39 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:58667 (size: 1148.0 B, free: 1966.1 MB) 15/05/14 00:23:39 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/05/14 00:23:39 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:839 15/05/14 00:23:39 INFO DAGScheduler: Submitting 5 missing tasks from Stage 0 (ScalaEsRDD[0] at RDD at AbstractEsRDD.scala:17) 15/05/14 00:23:39 INFO TaskSchedulerImpl: Adding task set 0.0 with 5 tasks 15/05/14 00:23:39 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 3352 bytes) 15/05/14 00:23:39 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 3352 bytes) 15/05/14 00:23:39 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 15/05/14 00:23:39 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/05/14 00:24:54 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:24:54 INFO HttpMethodDirector: Retrying request 15/05/14 00:24:54 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:24:54 INFO HttpMethodDirector: Retrying request 15/05/14 00:26:09 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:26:09 INFO HttpMethodDirector: Retrying request 15/05/14 00:26:09 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:26:09 INFO HttpMethodDirector: Retrying request 15/05/14 00:27:25 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:27:25 INFO HttpMethodDirector: Retrying request 15/05/14 00:27:25 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:27:25 INFO HttpMethodDirector: Retrying request 15/05/14 00:28:40 ERROR NetworkClient: Node [Operation timed out] failed (172.31.14.100:9200); selected next node [52.68.202.80:9200] 15/05/14 00:28:40 ERROR NetworkClient: Node [Operation timed out] failed (172.31.14.100:9200); selected next node [52.68.202.80:9200] 15/05/14 00:28:40 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 18184 bytes result sent to driver 15/05/14 00:28:40 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, ANY, 3352 bytes) 15/05/14 00:28:40 INFO Executor: Running task 2.0 in stage 0.0 (TID 2) 15/05/14 00:28:40 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 301234 ms on localhost (1/5) 15/05/14 00:28:40 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 19532 bytes result sent to driver 15/05/14 00:28:40 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, ANY, 3352 bytes) 15/05/14 00:28:40 INFO Executor: Running task 3.0 in stage 0.0 (TID 3) 15/05/14 00:28:40 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 301315 ms on localhost (2/5) 15/05/14 00:29:55 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:29:55 INFO HttpMethodDirector: Retrying request 15/05/14 00:29:56 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:29:56 INFO HttpMethodDirector: Retrying request 15/05/14 00:31:11 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:31:11 INFO HttpMethodDirector: Retrying request 15/05/14 00:31:11 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:31:11 INFO HttpMethodDirector: Retrying request 15/05/14 00:32:26 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:32:26 INFO HttpMethodDirector: Retrying request 15/05/14 00:32:26 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:32:26 INFO HttpMethodDirector: Retrying request 15/05/14 00:33:42 ERROR NetworkClient: Node [Operation timed out] failed (172.31.14.100:9200); selected next node [52.68.202.80:9200] 15/05/14 00:33:42 ERROR NetworkClient: Node [Operation timed out] failed (172.31.14.100:9200); selected next node [52.68.202.80:9200] 15/05/14 00:33:42 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 18177 bytes result sent to driver 15/05/14 00:33:42 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, ANY, 3352 bytes) 15/05/14 00:33:42 INFO Executor: Running task 4.0 in stage 0.0 (TID 4) 15/05/14 00:33:42 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 301864 ms on localhost (3/5) 15/05/14 00:33:42 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 18176 bytes result sent to driver 15/05/14 00:33:42 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 301882 ms on localhost (4/5) 15/05/14 00:34:58 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:34:58 INFO HttpMethodDirector: Retrying request 15/05/14 00:36:13 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:36:13 INFO HttpMethodDirector: Retrying request 15/05/14 00:37:29 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:37:29 INFO HttpMethodDirector: Retrying request 15/05/14 00:38:44 ERROR NetworkClient: Node [Operation timed out] failed (172.31.14.100:9200); selected next node [52.68.202.80:9200] 15/05/14 00:38:45 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 19190 bytes result sent to driver 15/05/14 00:38:45 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 302573 ms on localhost (5/5) 15/05/14 00:38:45 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/05/14 00:38:45 INFO DAGScheduler: Stage 0 (collect at App.scala:19) finished in 905.680 s 15/05/14 00:38:45 INFO DAGScheduler: Job 0 finished: collect at App.scala:19, took 905.884621 s * Finally, after the long way, starting to foreach(println) (AU1KNAQHKoN_e7xsz2J7,Map(no -> 38344049, s_no -> 2988722, source -> wifiscan2, bssid -> d850e6d5a770, ssid -> jasonlan, rss -> -91.0, gps_lat -> 24.99175413, gps_lng -> 121.28153416, net_lat -> -10000.0, net_lng -> -10000.0, m_lat -> 24.9916650318, m_lng -> 121.281452128, imei -> 352842060663324, scan_time -> Mon Sep 01 08:00:06 CST 2014, trace -> 1)) * and keep trying to connect 15/05/14 00:38:45 INFO SparkContext: Starting job: count at App.scala:20 15/05/14 00:38:45 INFO DAGScheduler: Got job 1 (count at App.scala:20) with 5 output partitions (allowLocal=false) 15/05/14 00:38:45 INFO DAGScheduler: Final stage: Stage 1(count at App.scala:20) 15/05/14 00:38:45 INFO DAGScheduler: Parents of final stage: List() 15/05/14 00:38:45 INFO DAGScheduler: Missing parents: List() 15/05/14 00:38:45 INFO DAGScheduler: Submitting Stage 1 (ScalaEsRDD[0] at RDD at AbstractEsRDD.scala:17), which has no missing parents 15/05/14 00:38:45 INFO MemoryStore: ensureFreeSpace(1464) called with curMem=2644, maxMem=2061647216 15/05/14 00:38:45 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 1464.0 B, free 1966.1 MB) 15/05/14 00:38:45 INFO MemoryStore: ensureFreeSpace(1121) called with curMem=4108, maxMem=2061647216 15/05/14 00:38:45 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1121.0 B, free 1966.1 MB) 15/05/14 00:38:45 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:58667 (size: 1121.0 B, free: 1966.1 MB) 15/05/14 00:38:45 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/05/14 00:38:45 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839 15/05/14 00:38:45 INFO DAGScheduler: Submitting 5 missing tasks from Stage 1 (ScalaEsRDD[0] at RDD at AbstractEsRDD.scala:17) 15/05/14 00:38:45 INFO TaskSchedulerImpl: Adding task set 1.0 with 5 tasks 15/05/14 00:38:45 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 5, localhost, ANY, 3352 bytes) 15/05/14 00:38:45 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 6, localhost, ANY, 3352 bytes) 15/05/14 00:38:45 INFO Executor: Running task 0.0 in stage 1.0 (TID 5) 15/05/14 00:38:45 INFO Executor: Running task 1.0 in stage 1.0 (TID 6) 15/05/14 00:40:00 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 15/05/14 00:40:00 INFO HttpMethodDirector: Retrying request It's weird, when I sbt package --> deploy it to the ES server --> spark-submit --> everything is ok without waiting and error messages I am sure I am missing some thing but unable to figure out that. >_< Thanks for your help!! -- Please update your bookmarks! We have moved to https://discuss.elastic.co/ --- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7589e5a7-8428-4b0a-aefe-7f082e5f480a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.