Re: [CSV] Accessing a subset of the available headers (Was: Re: [CSV] Headers and the first record)

Gary Gregory Wed, 31 Jul 2013 12:11:59 -0700

On Wed, Jul 31, 2013 at 2:38 PM, Benedikt Ritter <[email protected]> wrote:


> 2013/7/31 Gary Gregory <[email protected]>
>
> > On Wed, Jul 31, 2013 at 10:42 AM, Benedikt Ritter <[email protected]
> > >wrote:
> >
> > > <snip>
> > >
> > > >> A use case I have now is a CSV file with a lot of columns (~90) but
> I
> > > only
> > > >> care about a small subset of the columns (~10). I'd like to be able
> to
> > > say
> > > >> withHeader(Set) where the Set may be a subset of the actual column
> > names
> > > in
> > > >> the header line. This is different from withHeader(String[]) because
> > the
> > > >> names in the Set must match the names in the header record.
> > >
> > > > >
> > > > > What you are talking about sounds more like a view or a projection
> of
> > > the
> > > > > actual content being parsed.
> > > > > Do we really need this for 1.0 or can it be postponed?
> > > >
> > > > This is a real scenario and a real need, not some imaginary
> > complication
> > > ;)
> > > >
> > > > Even if it is not implemented for 1.0, we should talk about how it
> > > > should be done such that it fits in and does not cause API problems
> > > > later. And if I can get it done by then, then that much the better.
> > > >
> > >
> > > Okay, then let's discuss this on a new thread :-)
> > >
> > > As I've said, I think we should not push to much into
> > > withHeaders(String...). Maybe this is some sort of view, where you can
> > pass
> > > a parser and the headers you are interested in and it will return an
> > > Iterable<CSVRecord> (or CSVParser) that just gives access to the
> > specified
> > > headers you are interessted in?
> > >
> > > Would it be possible to give a code example of what you have to do with
> > to
> > > current API in your use case and what you want?
> > >
> >
> > I am switching to withHeader() with no arg (same as a new String[]{}) and
> > let the parser guess the headers and then pray that the names match
> between
> > the app and the files. Which is just as unsafe as forcing the headers in
> > fixed order on the parser because the column order might have changed.
> > Ideally, the column order should not matter, which it does not when you
> do
> > a record.get(String), which is nice.
> >
> > Calling withHeader() with no args is less brittle than calling it with 90
> > args. The benefit is that the column order in the file can change without
> > affecting the app, which is good. I could use a little more
> bullet-proofing
> > by making the column names optionally case-insensitive, but that's a
> > different feature.
> >
> > Ideally, I want to define the column names in the app as a simple Java
> > enum, then use an enum as a record key. That does not work for column
> names
> > that have spaces in them as mine do, so it's back to classic static final
> > Strings as keys. I could create a fancier custom enum but it's not worth
> it
> > for now.
> >
>
> Hey Gary,
>
> I still don't understand what you are suggesting. At first I though this
> was about accessing a subset of the actual columns (you said your file has
> 90 columns but you are only interested in ~10).
>
> Your last message sounds more like you're looking for a better way to make
> sure the headers parsed from the file match what you are expecting. I guess
> this is why getHeaderMap is now public (?!)
>

> What am I missing?
>

Sorry, it seems I keep on mixing up the topics it seems. More my many
columned file, I'm going with withHeaders() [no args] and get(String).
That's good enough but I still need to have the proper header skipping,
which is now in.

Yes, I'm looking for what amounts to schema validation, but since
get(String) will fail on the first record, that's fail-fast enough for now
:)

getHeaderMap() has been public for a long time, so that's not an issue here.

getHeader() OTOH is now public because I want to be able to build on one
format to get a new one.

Gary



>
> Benedikt
>
>
> >
> > Gary
> >
> >
> > > Benedikt
> > >
> > >
> > >
> > > --
> > > http://people.apache.org/~britter/
> > > http://www.systemoutprintln.de/
> > > http://twitter.com/BenediktRitter
> > > http://github.com/britter
> > >
> >
> >
> >
> > --
> > E-Mail: [email protected] | [email protected]
> > Java Persistence with Hibernate, Second Edition<
> > http://www.manning.com/bauer3/>
> > JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> > Spring Batch in Action <http://www.manning.com/templier/>
> > Blog: http://garygregory.wordpress.com
> > Home: http://garygregory.com/
> > Tweet! http://twitter.com/GaryGregory
> >
>
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>



-- 
E-Mail: [email protected] | [email protected]
Java Persistence with Hibernate, Second Edition<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [CSV] Accessing a subset of the available headers (Was: Re: [CSV] Headers and the first record)

Reply via email to