partition by category

2015-04-08 Thread SiMaYunRui
Hi folks, I am writing to ask how to filter and partition a set of files thru Spark. The situation is that I have N big files (cannot fit into single machine). And each line of files starts with a category (say Sport, Food, etc), while only have less than 100 categories actually. I need a

RE: Percentile example

2015-02-17 Thread SiMaYunRui
-- the approximation is only important for distributing work among all executors. Even if the approximation is inaccurate, you'll still correct for it, you will just have unequal partitions. Imran On Sun, Feb 15, 2015 at 9:37 AM, SiMaYunRui myl...@hotmail.com wrote: hello, I am a newbie

RE: Percentile example

2015-02-17 Thread SiMaYunRui
-NAME GROUP BY FIELD1, FIELD2;” JavaSchemaRDD result = hsc.hql(hql); ListRow grp = result.collect(); for (int z = 2; z row.length(); z++) { // Do something with the results } Curt From: SiMaYunRui myl...@hotmail.com Date: Sunday, February 15, 2015 at 10:37 AM To: user

Percentile example

2015-02-15 Thread SiMaYunRui
hello, I am a newbie to spark and trying to figure out how to get percentile against a big data set. Actually, I googled this topic but not find any very useful code example and explanation. Seems that I can use transformer SortBykey to get my data set in order, but not pretty sure how can I