Re: [Boston.pm] Dealing with different EOL's

Uri Guttman Thu, 24 Jan 2008 08:20:22 -0800

>>>>> "AB" == Alex Brelsfoard <[EMAIL PROTECTED]> writes:


  AB> We are reading in feeds (typically TSV or CSV files), parsing
  AB> them, reformatting the data, and spitting it out as another file.
  AB> These feeds can come from all sorts of people/places/things.  So
  AB> sometimes they are wonderfully formatted and we understand their
  AB> content.  Sometime they are not, and we do not.

  AB> Their CMS may just enclose this field in quotes and move on.

if the field is in quotes, then i think a CSV module can parse that for
you regardless of embedded line breaks. check them out on cpan and see
what they offer.

  AB> So now we have a feed where the description column may have
  AB> linebreaks in it.  So I can't just split on any form of linebreak.

so you need to come up with a better spec for a 'row' and embedded line
breaks. again, showing some real data will help. note that we need to
see the line break problem and not have your mailer line wrap also which
would make it impossible to figure out.

  AB> Does this make a bit more sense?

sorta. but concrete examples are needed. there may be clues as to how to
parse these lines such as embedded line breaks happen around certain
types of chars, etc. if they are truly random then you may be in deep
doodoo of heuristics. i was there once also when doing some parsing of
various data feeds. no way to get it 100% correct.

  AB> btw, there's no chance that I could define $/ as a regex could I?

file::slurp supports a regex for the line separator. set $/ to a regex
(localized) before a call to read_file in list context and you should
get lines as you want.

uri

-- 
Uri Guttman  ------  [EMAIL PROTECTED]  --------  http://www.sysarch.com --
-----  Perl Architecture, Development, Training, Support, Code Review  ------
-----------  Search or Offer Perl Jobs  ----- http://jobs.perl.org  ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Dealing with different EOL's

Reply via email to