Hi,

 

In our production Cassandra systems we are observing the time taken by same
PIG script keeps increasing each and every day. The PIG scripts reads data
for a day at a time from a Cassandra Column Family. The number of rows the
PIG script is expected to return is almost same every day, however every day
the amount of rows we are storing in Cassandra is increasing. We haven't
changed the default setting for multiquery, it is by default enabled.

 

Could this increase in PIG script execution time be related to the
increasing number of rows in Cassandra every day? 

 

Related to this I was trying to understand the behavior of LOAD statement.
Does LOAD statement reads all the data from Cassandra and then applies the
required filter conditions? If so the increase in execution time could be
attributed to the extra time required to read the ever increasing data in
Cassandra.

 

We are also working on a suitable archival mechanisms for our data so that
the total number of rows that are stored are always maintained at an optimum
count. This should also help us to maintain almost constant PIG script
execution time every day.

 

Please advice.

 

Thanks,

Badri

 

 

 

Reply via email to