pig-commits  

[Pig Wiki] Update of "FAQ" by AmirYoussefi

Apache Wiki
Wed, 30 Apr 2008 15:44:13 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by AmirYoussefi:
http://wiki.apache.org/pig/FAQ

------------------------------------------------------------------------------
- ---+!! PigFAQ
+ Pig FAQ
  
- ---++++ 1. I'm using PigStorage to parse my input files. Can I make it use 
control characters as delimiters?
+ 1. I'm using PigStorage to parse my input files. Can I make it use control 
characters as delimiters?
  
  A. Yes. Examples: PigStorage('\u0001') for Ctrl+A or '\u007C' for this 
character: |
  
- 
- ---++++2. Can I do a numerical comparison while filtering?
+ 2. Can I do a numerical comparison while filtering?
  
  A. Yes, you can choose between numerical and string comparison. For numerical 
comparison use the operators =, <>, <  etc. and for string comparisons use eq, 
neq etc. 
  
- ---++++3. How do I make my jobs run on multiple machines?
+ 3. How do I make my jobs run on multiple machines?
  
  A. Use the PARALLEL clause. For example =C = JOIN A by url, B by url PARALLEL 
50=
  
- ---++++4. Does Pig support NULLs?
+ 4. Does Pig support NULLs?
  
  A. Pig currently has no support for NULL values but it is on the roadmap.
  
- ---++++5. Does pig support regular expressions?
+ 5. Does pig support regular expressions?
  
  A. Pig does support regular expression matching via =matches= keyward. Tt 
uses java.util.regexp matches which means your pattern has to match the entire 
string (ie if your string is "hi fred" and you want to find "fred" you have to 
give a pattern of ".*fred" not "fred").
  
- ---++++6. How to prevent failure if some records don't have the needed number 
of columns.
+ 6. How to prevent failure if some records don't have the needed number of 
columns.
  
  You can filter away those records by including the following in your Pig 
program:
  
- <verbatim>
+ 
  A = load 'foo' using PigStorage('\t');
  B = FILTER A BY ARITY(*) < 5;
  .....
- </verbatim>
+ 
  
  This code would drop all the records that has less than 5 columns.
  
- ---++++7. Is there any difference between == and eq for numeric comparisons?
+ 7. Is there any difference between == and eq for numeric comparisons?
  
  For equality, there is no difference while you stay in integers. However 11.0 
and 11 will be equal with == but not with eq. 
  
- ---++++8. Is there an easy way for me to figure out how many rows exists in a 
dataset from its alias?
+ 8. Is there an easy way for me to figure out how many rows exists in a 
dataset from its alias?
  
  You can run the following set of commands:
  
- <verbatim>
+ 
  a = load 'bla' ... ;
+ 
  b = group a all;
+ 
  c = foreach b generate COUNT(a.$0);
- </verbatim>
+ 
  
  This is equivalent to select count(*) in SQL.
  
- ---++++9. Does Pig allow grouping on expressions
+ 9. Does Pig allow grouping on expressions
  
  Currently, Pig only allows to group on data fields rather than expressions. 
Allowing grouping on expressions is on our road map. Stay tuned!