Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-16 Thread Tomasz Zielonka
On Fri, Jun 15, 2007 at 11:31:36PM +0100, Jim Burton wrote: I think that would only work if there was one column per line...I didn't make it clear that as well as being comma separated, the delimiter is around each column, of which there are several on a line so if the delimiter is ~ a file

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-16 Thread Donn Cave
Quoth Tomasz Zielonka [EMAIL PROTECTED]: | On Fri, Jun 15, 2007 at 11:31:36PM +0100, Jim Burton wrote: | I think that would only work if there was one column per line...I didn't | make it clear that as well as being comma separated, the delimiter is | around each column, of which there are

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-16 Thread Jim Burton
Tomasz Zielonka wrote: On Fri, Jun 15, 2007 at 11:31:36PM +0100, Jim Burton wrote: I think that would only work if there was one column per line...I didn't make it clear that as well as being comma separated, the delimiter is around each column, of which there are several on a line so if the

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-16 Thread Tomasz Zielonka
On Sat, Jun 16, 2007 at 12:08:22PM +0100, Jim Burton wrote: Tomasz Zielonka wrote: It would be easier to experiment if you could provide us with an example input file. If you are worried about revealing sensitive information, you can change all characters other then newline, ~ and , to As,

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-16 Thread Jim Burton
Tomasz Zielonka wrote: I guess you've tried to convince Oracle to produce the right format in the first place, so there would be no need for post-processing...? We don't control that job or the first db. I wonder what would you get if you set the delimiter to be a newline ;-) eek! ;-)

[Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Jim Burton
I need to remove newlines from csv files (within columns, not at the end of entire lines). This is prior to importing into a database and was being done at my workplace by a java class for quite a while until the files processed got bigger and it proved to be too slow. (The files are up to ~250MB

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Thomas Schilling
On 15 jun 2007, at 18.13, Jim Burton wrote: import qualified Data.ByteString.Char8 as B Have you tried import qualified Data.ByteString.Lazy.Char8 as B ? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Jim Burton
Thomas Schilling wrote: On 15 jun 2007, at 18.13, Jim Burton wrote: import qualified Data.ByteString.Char8 as B Have you tried import qualified Data.ByteString.Lazy.Char8 as B ? No -- I'll give it a try and compare them. Is laziness preferable here? Thanks,

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Jason Dagit
On 6/15/07, Jim Burton [EMAIL PROTECTED] wrote: No -- I'll give it a try and compare them. Is laziness preferable here? Laziness might give you constant space usage (if you are sufficiently lazy). Which would help with the thrashing. Jason ___

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Thomas Schilling
On 15 jun 2007, at 21.14, Jim Burton wrote: Thomas Schilling wrote: On 15 jun 2007, at 18.13, Jim Burton wrote: import qualified Data.ByteString.Char8 as B Have you tried import qualified Data.ByteString.Lazy.Char8 as B ? No -- I'll give it a try and compare them. Is laziness preferable

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Sebastian Sylvan
On 15/06/07, Jim Burton [EMAIL PROTECTED] wrote: I need to remove newlines from csv files (within columns, not at the end of entire lines). This is prior to importing into a database and was being done at my workplace by a java class for quite a while until the files processed got bigger and

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Jim Burton
Sebastian Sylvan wrote: On 15/06/07, Jim Burton [EMAIL PROTECTED] wrote: [snip] Hi, Hi Sebastian, I haven't compiled this, but you get the general idea: import qualified Data.ByteString.Lazy.Char8 as B -- takes a bytestring representing the file, concats the lines -- then splits it up into

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Jason Dagit
On 6/15/07, Jim Burton [EMAIL PROTECTED] wrote: Sebastian Sylvan wrote: On 15/06/07, Jim Burton [EMAIL PROTECTED] wrote: [snip] Hi, Hi Sebastian, I haven't compiled this, but you get the general idea: import qualified Data.ByteString.Lazy.Char8 as B -- takes a bytestring representing the

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Jim Burton
Jason Dagit wrote: [snip] I love to see people using Haskell, especially professionally, but I have to wonder if the real tool for this job is sed? :-) Jason Maybe it is -- I've never used sed. (cue oohs and ahhs from the gallery?) But from the (unquantified) gains so far haskell may

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Brandon S. Allbery KF8NH
On Jun 15, 2007, at 18:37 , Jason Dagit wrote: I love to see people using Haskell, especially professionally, but I have to wonder if the real tool for this job is sed? :-) Actually, while sed could do that, it'd be a nightmare. You really want a parser to deal with general CSV like this,

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Jim Burton
Sebastian Sylvan wrote: A sorry, I thought the delimiter was a line delimiter. I'm trying to get to that fusion goodness by using built-in functions as much as possible... How about this one: clean del = B.map ( B.filter (/='\n') ) . B.groupBy (\x y - (x,y) /= (del,'\n')) That groupBy will

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Sebastian Sylvan
On 16/06/07, Jim Burton [EMAIL PROTECTED] wrote: Sebastian Sylvan wrote: A sorry, I thought the delimiter was a line delimiter. I'm trying to get to that fusion goodness by using built-in functions as much as possible... How about this one: clean del = B.map ( B.filter (/='\n') ) .

Re: [Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

2007-06-15 Thread Jason Dagit
On 6/15/07, Sebastian Sylvan [EMAIL PROTECTED] wrote: Benchmark it I guess :-) Both versions use a non-bytestring recursive functions (the outer B.map should just be a straight map, and yours use a foldr), which may mess fusion up... Not sure what would happe here... I don't have a Haskell