spark optimized pagination

2018-06-09 Thread onmstester onmstester
Hi,

I'm using spark on top of cassandra as backend CRUD of a Restfull Application.

Most of Rest API's retrieve huge amount of data from cassandra and doing a lot 
of aggregation on them  in spark which take some seconds.



Problem: sometimes the output result would be a big list which make client 
browser throw stop script, so we should paginate the result at the server-side,

but it would be so annoying for user to wait some seconds on each page to 
cassandra-spark processings,



Current Dummy Solution: For now i was thinking about assigning a UUID to each 
request which would be sent back and forth between server-side and client-side,

the first time a rest API invoked, the result would be saved in a temptable  
and in subsequent similar requests (request for next pages) the result would be 
fetch from

temptable (instead of common flow of retrieve from cassandra + aggregation in 
spark which would take some time). On memory limit, the old results would be 
deleted.



Is there any built-in clean caching strategy in spark to handle such scenarios?



Sent using Zoho Mail







Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Abhijeet Kumar
The situation is completely different than what you are thinking. Ok,
thanks for your time. From now I'll figure this out myself. Thank you again!

On Sat, 9 Jun 2018, 13:27 Jörn Franke,  wrote:

> Why don’t you write the final name from the start?
> Ie save as the file it should be named.
>
> On 9. Jun 2018, at 09:44, Abhijeet Kumar 
> wrote:
>
> I need to rename the file. I can write a separate program for this, I
> think.
>
> Thanks,
> Abhijeet Kumar
>
> On 09-Jun-2018, at 1:10 PM, Jörn Franke  wrote:
>
> That would be an anti pattern and would lead to bad software.
> Please don’t do it for the sake of the people that use your software.
> What do you exactly want to achieve with the information if the file
> exists or not?
>
> On 9. Jun 2018, at 08:34, Abhijeet Kumar 
> wrote:
>
> Can you please tell the estimated time. So, that my program will wait for
> that time period.
>
> Thanks,
> Abhijeet Kumar
>
> On 09-Jun-2018, at 12:01 PM, Jörn Franke  wrote:
>
> You need some time until the information of the file creation is
> propagated.
>
> On 9. Jun 2018, at 08:07, Abhijeet Kumar 
> wrote:
>
> I'm modifying a CSV file which is inside HDFS and finally putting it back
> to HDFS in Spark.
>
> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
> csv_file.coalesce(1).write
>   .format("csv”)
>   .mode("overwrite”)
>   .save("hdfs://localhost:8020/data/temp_insight”)Thread.sleep(15000)
> println(fs.exists(new Path("/data/temp_insight")))
>
> Output:
>
> false
>
> while I have stopped the thread for 15 sec, I have checked my hdfs using
> command
>
> hdfs dfs -ls /data/temp_insight
>
> Output:
>
> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where 
> applicable-rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
> /data/temp_insight/_SUCCESS-rw-r--r--   3 abhijeet supergroup201 
> 2018-06-08 17:48 
> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
>
> To cross verify whether it is taking the path of hdfs or not I have added
> one more println statement in my code, providing the path which is already
> there in HDFS. It's showing true in that case.
>
> So, what could be the reason?
> Thanks,
>
> Abhijeet Kumar
>
>
>
>


Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Jörn Franke
Why don’t you write the final name from the start?
Ie save as the file it should be named.

> On 9. Jun 2018, at 09:44, Abhijeet Kumar  wrote:
> 
> I need to rename the file. I can write a separate program for this, I think.
> 
> Thanks,
> Abhijeet Kumar 
>> On 09-Jun-2018, at 1:10 PM, Jörn Franke  wrote:
>> 
>> That would be an anti pattern and would lead to bad software.
>> Please don’t do it for the sake of the people that use your software.
>> What do you exactly want to achieve with the information if the file exists 
>> or not?
>> 
>>> On 9. Jun 2018, at 08:34, Abhijeet Kumar  
>>> wrote:
>>> 
>>> Can you please tell the estimated time. So, that my program will wait for 
>>> that time period.
>>> 
>>> Thanks,
>>> Abhijeet Kumar
 On 09-Jun-2018, at 12:01 PM, Jörn Franke  wrote:
 
 You need some time until the information of the file creation is 
 propagated.
 
> On 9. Jun 2018, at 08:07, Abhijeet Kumar  
> wrote:
> 
> I'm modifying a CSV file which is inside HDFS and finally putting it back 
> to HDFS in Spark.
> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
> csv_file.coalesce(1).write
>   .format("csv”)
>   .mode("overwrite”)
>   .save("hdfs://localhost:8020/data/temp_insight”)
> Thread.sleep(15000)
> println(fs.exists(new Path("/data/temp_insight")))
> Output:
> 
> false
> while I have stopped the thread for 15 sec, I have checked my hdfs using 
> command
> 
> hdfs dfs -ls /data/temp_insight
> Output:
> 
> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes 
> where applicable
> -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
> /data/temp_insight/_SUCCESS
> -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
> To cross verify whether it is taking the path of hdfs or not I have added 
> one more println statement in my code, providing the path which is 
> already there in HDFS. It's showing true in that case.
> 
> So, what could be the reason?
> 
> Thanks,
> 
> Abhijeet Kumar
>>> 
> 


Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Abhijeet Kumar
I need to rename the file. I can write a separate program for this, I think.

Thanks,
Abhijeet Kumar 
> On 09-Jun-2018, at 1:10 PM, Jörn Franke  wrote:
> 
> That would be an anti pattern and would lead to bad software.
> Please don’t do it for the sake of the people that use your software.
> What do you exactly want to achieve with the information if the file exists 
> or not?
> 
> On 9. Jun 2018, at 08:34, Abhijeet Kumar  > wrote:
> 
>> Can you please tell the estimated time. So, that my program will wait for 
>> that time period.
>> 
>> Thanks,
>> Abhijeet Kumar
>>> On 09-Jun-2018, at 12:01 PM, Jörn Franke >> > wrote:
>>> 
>>> You need some time until the information of the file creation is propagated.
>>> 
>>> On 9. Jun 2018, at 08:07, Abhijeet Kumar >> > wrote:
>>> 
 I'm modifying a CSV file which is inside HDFS and finally putting it back 
 to HDFS in Spark.
 val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
 csv_file.coalesce(1).write
   .format("csv”)
   .mode("overwrite”)
   .save("hdfs://localhost:8020/data/temp_insight 
 ”)
 Thread.sleep(15000)
 println(fs.exists(new Path("/data/temp_insight")))
 Output:
 
 false
 while I have stopped the thread for 15 sec, I have checked my hdfs using 
 command
 
 hdfs dfs -ls /data/temp_insight
 Output:
 
 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
 /data/temp_insight/_SUCCESS
 -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
 /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
 To cross verify whether it is taking the path of hdfs or not I have added 
 one more println statement in my code, providing the path which is already 
 there in HDFS. It's showing true in that case.
 
 So, what could be the reason?
 
 Thanks,
 
 Abhijeet Kumar
>> 



Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Jörn Franke
That would be an anti pattern and would lead to bad software.
Please don’t do it for the sake of the people that use your software.
What do you exactly want to achieve with the information if the file exists or 
not?

> On 9. Jun 2018, at 08:34, Abhijeet Kumar  wrote:
> 
> Can you please tell the estimated time. So, that my program will wait for 
> that time period.
> 
> Thanks,
> Abhijeet Kumar
>> On 09-Jun-2018, at 12:01 PM, Jörn Franke  wrote:
>> 
>> You need some time until the information of the file creation is propagated.
>> 
>>> On 9. Jun 2018, at 08:07, Abhijeet Kumar  
>>> wrote:
>>> 
>>> I'm modifying a CSV file which is inside HDFS and finally putting it back 
>>> to HDFS in Spark.
>>> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
>>> csv_file.coalesce(1).write
>>>   .format("csv”)
>>>   .mode("overwrite”)
>>>   .save("hdfs://localhost:8020/data/temp_insight”)
>>> Thread.sleep(15000)
>>> println(fs.exists(new Path("/data/temp_insight")))
>>> Output:
>>> 
>>> false
>>> while I have stopped the thread for 15 sec, I have checked my hdfs using 
>>> command
>>> 
>>> hdfs dfs -ls /data/temp_insight
>>> Output:
>>> 
>>> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>>> library for your platform... using builtin-java classes where applicable
>>> -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
>>> /data/temp_insight/_SUCCESS
>>> -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
>>> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
>>> To cross verify whether it is taking the path of hdfs or not I have added 
>>> one more println statement in my code, providing the path which is already 
>>> there in HDFS. It's showing true in that case.
>>> 
>>> So, what could be the reason?
>>> 
>>> Thanks,
>>> 
>>> Abhijeet Kumar
> 


Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Abhijeet Kumar
Can you please tell the estimated time. So, that my program will wait for that 
time period.

Thanks,
Abhijeet Kumar
> On 09-Jun-2018, at 12:01 PM, Jörn Franke  wrote:
> 
> You need some time until the information of the file creation is propagated.
> 
> On 9. Jun 2018, at 08:07, Abhijeet Kumar  > wrote:
> 
>> I'm modifying a CSV file which is inside HDFS and finally putting it back to 
>> HDFS in Spark.
>> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
>> csv_file.coalesce(1).write
>>   .format("csv”)
>>   .mode("overwrite”)
>>   .save("hdfs://localhost:8020/data/temp_insight 
>> ”)
>> Thread.sleep(15000)
>> println(fs.exists(new Path("/data/temp_insight")))
>> Output:
>> 
>> false
>> while I have stopped the thread for 15 sec, I have checked my hdfs using 
>> command
>> 
>> hdfs dfs -ls /data/temp_insight
>> Output:
>> 
>> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>> library for your platform... using builtin-java classes where applicable
>> -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
>> /data/temp_insight/_SUCCESS
>> -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
>> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
>> To cross verify whether it is taking the path of hdfs or not I have added 
>> one more println statement in my code, providing the path which is already 
>> there in HDFS. It's showing true in that case.
>> 
>> So, what could be the reason?
>> 
>> Thanks,
>> 
>> Abhijeet Kumar



Re: Spark / Scala code not recognising the path?

2018-06-09 Thread Jörn Franke
You need some time until the information of the file creation is propagated.

> On 9. Jun 2018, at 08:07, Abhijeet Kumar  wrote:
> 
> I'm modifying a CSV file which is inside HDFS and finally putting it back to 
> HDFS in Spark.
> val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
> csv_file.coalesce(1).write
>   .format("csv”)
>   .mode("overwrite”)
>   .save("hdfs://localhost:8020/data/temp_insight”)
> Thread.sleep(15000)
> println(fs.exists(new Path("/data/temp_insight")))
> Output:
> 
> false
> while I have stopped the thread for 15 sec, I have checked my hdfs using 
> command
> 
> hdfs dfs -ls /data/temp_insight
> Output:
> 
> 18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> -rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
> /data/temp_insight/_SUCCESS
> -rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
> /data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
> To cross verify whether it is taking the path of hdfs or not I have added one 
> more println statement in my code, providing the path which is already there 
> in HDFS. It's showing true in that case.
> 
> So, what could be the reason?
> 
> Thanks,
> 
> Abhijeet Kumar


Spark / Scala code not recognising the path?

2018-06-09 Thread Abhijeet Kumar
I'm modifying a CSV file which is inside HDFS and finally putting it back to 
HDFS in Spark.
val fs=FileSystem.get(spark.sparkContext.hadoopConfiguration)
csv_file.coalesce(1).write
  .format("csv”)
  .mode("overwrite”)
  .save("hdfs://localhost:8020/data/temp_insight”)
Thread.sleep(15000)
println(fs.exists(new Path("/data/temp_insight")))
Output:

false
while I have stopped the thread for 15 sec, I have checked my hdfs using command

hdfs dfs -ls /data/temp_insight
Output:

18/06/08 17:48:18 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
-rw-r--r--   3 abhijeet supergroup  0 2018-06-08 17:48 
/data/temp_insight/_SUCCESS
-rw-r--r--   3 abhijeet supergroup201 2018-06-08 17:48 
/data/temp_insight/part-0-7bffb826-f18d-4022-b089-da85565525b7-c000.csv
To cross verify whether it is taking the path of hdfs or not I have added one 
more println statement in my code, providing the path which is already there in 
HDFS. It's showing true in that case.

So, what could be the reason?

Thanks,

Abhijeet Kumar