Hey Jonathan,

With respect to handling arrays (as opposed to Collection<T)>s), there
are lots of different serialization approaches you could take
depending on what you are planning to do with the output, and so my
general inclination has been to not impose a particular approach on
people within the core PTypeFamily stuff. That may or may not be the
right solution, and it's certainly something we need feedback on.

My advice is to think about how you want to consume or output the
array and then use the _derived_ method on the PTypeFamily of your
choice to convert from some base format (like a String or a byte
array) to the float array you want to work with. Here's some code that
I just spun up that uses Avro's bytes() type as the base for working
with a float array:

  private static final MapFn<ByteBuffer, float[]> IN = new
MapFn<ByteBuffer, float[]>() {
    @Override
    public float[] map(ByteBuffer input) {
      float[] f = new float[(input.limit() - input.position()) / 4];
      input.asFloatBuffer().get(f);
      return f;
    }
  };
  private static final MapFn<float[], ByteBuffer> OUT = new
MapFn<float[], ByteBuffer>() {
    @Override
    public ByteBuffer map(float[] input) {
      byte[] bb = new byte[input.length * 4];
      for (int i = 0; i < input.length; i++) {
        int f = Float.floatToRawIntBits(input[i]);
        bb[4*i] = (byte)((f >> 24) & 0xff);
        bb[4*i + 1] = (byte)((f >> 16) & 0xff);
        bb[4*i + 2] = (byte)((f >> 8) & 0xff);
        bb[4*i + 3] = (byte)((f >> 0) & 0xff);
      }
      return ByteBuffer.wrap(bb);
    }
  };

  public static final PType<float[]> FLOAT_ARRAYS =
Avros.derived(float[].class, IN, OUT, Avros.bytes());

  @Test
  public void testInAndOut() throws Exception {
    float[] data = new float[] { 17.29f, 2.0f, 3.14f };
    float[] out = IN.map(OUT.map(data));
    assertEquals(data.length, out.length);
    for (int i = 0; i < out.length; i++) {
      assertEquals(data[i], out[i], 0.0001f);
    }
  }

Best,
Josh

On Tue, Jul 31, 2012 at 5:34 PM, Jonathan Dinu <[email protected]> wrote:
> Hi,
>
> I am pretty new to Crunch and I am having a difficult time working around 
> PTypes specifically as the third argument to parallelDo when serializing the 
> collection.
>
> I have been using some of the PTypeFamily methods for primitives like Strings 
> but I am trying to create a PCollection output that contains Arrays of 
> Floats.  What I want out is PCollection<Float[]> but i cannot seem to coerce 
> the right Ptype.  I am not sure if this is the best approach or if I should 
> be using a different type to store my records in the collection.  Or if there 
> is a way to register custom types and define how they should be serialized to 
> disk.
>
> Any help is greatly appreciated.
>
> Thanks,
> Joanthan

Reply via email to