Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by AlexSmith: http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy The comment on the change is: adds example for numeric sorting ------------------------------------------------------------------------------ }}} - === How to do Order By? === + === Simulating Order By === We can set the number of reducers to 1, to make sure we have the same result as ''ORDER BY''. @@ -50, +50 @@ }}} This sometimes will make the reducer a performance bottleneck. A lot of cases the user only wants to see the top N rows where N is a small number. In this case, we can use LIMIT clause. We don't have an example here but users are encouraged to provide one. + + === Setting Types for Sort By === + + After a transform, variable types are generally considered to be strings, meaning that numeric data will be sorted lexicographically. To overcome this, a second SELECT statement with casts can be used before using SORT BY. + + {{{ + FROM (FROM (FROM src + SELECT TRANSFORM(value) + USING 'mapper' + AS value, count) mapped + SELECT cast(value as double) AS value, cast(count as int) AS count + SORT BY value, count) sorted + SELECT TRANSFORM(value, count) + USING 'reducer' + AS whatever + }}} == Syntax of Cluster By and Distribute By ==
