Hi folks,
I am writing to ask how to filter and partition a set of files thru Spark.
The situation is that I have N big files (cannot fit into single machine). And
each line of files starts with a category (say Sport, Food, etc), while only
have less than 100 categories actually. I need a
-- the approximation is only important
for distributing work among all executors. Even if the approximation is
inaccurate, you'll still correct for it, you will just have unequal partitions.
Imran On Sun, Feb 15, 2015 at 9:37 AM, SiMaYunRui myl...@hotmail.com wrote:
hello,
I am a newbie
-NAME GROUP BY FIELD1, FIELD2;”
JavaSchemaRDD result = hsc.hql(hql);
ListRow grp = result.collect();
for (int z = 2; z
row.length(); z++) {
// Do something with the results
}
Curt
From: SiMaYunRui myl...@hotmail.com
Date: Sunday, February 15, 2015 at 10:37 AM
To: user
hello,
I am a newbie to spark and trying to figure out how to get percentile against a
big data set. Actually, I googled this topic but not find any very useful code
example and explanation. Seems that I can use transformer SortBykey to get my
data set in order, but not pretty sure how can I