Re: Input on PTD dataset results

Sean Owen Mon, 26 Apr 2010 12:19:33 -0700

If splitting the data into files is useful and necessary, and I agree
that keeping file sizes under a GB sounds nice, then it's got to be
split somehow. Might as well split on some natural dimension (user ID
or something) rather than randomly chunking. The distribution concerns
are no greater if it's a concern, and if they're not, is a
convenience.


On Mon, Apr 26, 2010 at 8:14 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Exactly.  I would find skewed data a pain the butt for statistical analysis.
>

Re: Input on PTD dataset results

Reply via email to