Saurav,

Not that limited, but yes. Another example is in order. Say I have something like this: projected_data = FOREACH data GENERATE com.example.udfs.foo(7, 37, 'https', fields#'bar') as bat;

This sort of thing would be vastly better:
projected_data = FOREACH data GENERATE com.example.udfs.foo(FOO_COMMAND_CODE, MAX_FIELD_LENGTH, SCHEME, fields#'bar') as bat;

I know pig isn't a real programming language, maybe I'm asking for too much. But it's so brittle, and as we increase the number of various pig scripts, the odds of a change not breaking a bunch of stuff increases exponentially.

--- Eric Wadsworth

On 09/29/2010 11:06 AM, Saurav Datta wrote:
Hi Eric,

As I understand, you would like to define the value of the filter at run time, and this value would be taken from a file.
Am I correct ?

Regards,
Saurav

On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote:

Hi folks!

I'm brand new to this list, so apologies if this is an inappropriate newbie question, or is otherwise incorrect, but here goes.

I'm working with a bunch of pig scripts, and we're adding new ones almost daily. They are getting more and more complex. The problem is exacerbated by the proliferation of magic numbers throughout them. As a software engineer, these are driving me nuts! The code is quite brittle. There seems to be no way to centralize logic or even values.

For a simple example:
filtered_stuff = FILTER stuff by record_type == 23;

I'd prefer:
filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA;

Where RECORD_TYPE_ALPHA is defined in some other file that the pig script consumes.

Sounds rather like the old C-style header files would be in order...

Am I missing something obvious here? How do you guys handle this problem? (We're using pig 6 and are just starting to transition to pig 7.)

Thanks! --- Eric Wadsworth


Reply via email to