I've been working on my first Smalltalk program which needs to read
and write large c structs from a binary file. I wrote two classes
BinaryStreamReader and BinaryStreamWriter that take a stream and can
read (or write) all of the integer and floating point types I need
(also handles byte-swapping if necessary). I wrote a test program that
focuses on just reading a small (for us) 123 Mb data file on disk. The
program takes about 166 seconds to run compared to 1.2 seconds for an
equivalent C version (140x faster than Squeak version).

As an example of the style of code I've written, here is the method
that reads an unsigned 32-bit integer:

uint32
        " returns the next unsigned, 32-bit integer from the binary stream "
        " see PositionableStream for original implimentation."
        | n a b c d |
        isBigEndian
                ifTrue:
                        [ a := stream next.
                        b := stream next.
                        c := stream next.
                        d := stream next ]
                ifFalse:
                        [ d := stream next.
                        c := stream next.
                        b := stream next.
                        a := stream next ].
        ((((a notNil and: [ b notNil ]) and: [ c notNil ])) and: [ d notNil])
                ifTrue:
                        [ n := a.
                        n := (n bitShift: 8) + b.
                        n := (n bitShift: 8) + c.
                        n := (n bitShift: 8) + d ]
                ifFalse: [ n := nil ].
        ^ n

There are at 4 calls to stream next for each integer and sure enough,
a profile of the code (attached below) shows that most of the time is
being lost in the StandardFileStream basicNext and next methods. There
must be a better way to do this. Scaled up to operational code, I will
need to process about 40 Gb of data per day. My C code currently takes
about 16 cpu hours to do this work (including number crunching). In
Squeak, just reading the data would take 3 cpu months!

Hopefully, someone can help me out here. The working code is available
on squeaksource.org if anyone is interested:

http://www.squeaksource.com/@CWlm_vX4hAPUzk5w/7SVjQQhp

Thanks,

David

Below is a message tally of my program:



 - 166088 tallies, 166100 msec.

**Tree**
100.0% {166100ms} SEAFileReader>>printAllBlocks
  99.9% {165934ms} ProcessedPingBlock>>readFrom:
    99.9% {165934ms} XYZAPingData>>readFrom:
      99.7% {165602ms} XYZATransducerData>>readFrom:
        95.9% {159290ms} XYZAPointData>>readFrom:
          46.4% {77070ms} BinaryStreamReader>>double
            |41.9% {69596ms} BinaryStreamReader>>uint32
            |  |28.1% {46674ms} StandardFileStream>>next
            |  |  |14.1% {23420ms} primitives
            |  |  |14.0% {23254ms} StandardFileStream>>basicNext
            |  |9.8% {16278ms} LargePositiveInteger>>+
            |  |  |6.1% {10132ms} LargePositiveInteger(Integer)>>+
            |  |  |  |3.1% {5149ms} primitives
            |  |  |  |3.0% {4983ms} SmallInteger(Number)>>negative
            |  |  |3.7% {6146ms} primitives
            |  |4.1% {6810ms} primitives
            |2.5% {4153ms} Float class(Behavior)>>new:
            |2.0% {3322ms} primitives
          13.9% {23088ms} BinaryStreamReader>>float
            |10.4% {17274ms} BinaryStreamReader>>uint32
            |  |7.0% {11627ms} StandardFileStream>>next
            |  |  |3.5% {5814ms} primitives
            |  |  |3.5% {5814ms} StandardFileStream>>basicNext
            |  |2.4% {3986ms} LargePositiveInteger>>+
            |2.2% {3654ms} Float class>>fromIEEE32Bit:
          13.7% {22756ms} BinaryStreamReader>>int32
            |7.7% {12790ms} BinaryStreamReader>>uint32
            |  |6.8% {11295ms} StandardFileStream>>next
            |  |  3.5% {5814ms} StandardFileStream>>basicNext
            |  |  3.4% {5647ms} primitives
            |5.2% {8637ms} SmallInteger>>>=
            |  4.3% {7142ms} SmallInteger(Magnitude)>>>=
            |    3.5% {5814ms} SmallInteger>><
            |      2.6% {4319ms} SmallInteger(Integer)>><
          10.7% {17773ms} BinaryStreamReader>>uint16
            |6.9% {11461ms} StandardFileStream>>next
            |  |3.5% {5814ms} StandardFileStream>>basicNext
            |  |3.3% {5481ms} primitives
            |3.8% {6312ms} primitives
          6.8% {11295ms} BinaryStreamReader>>skip:
            |5.0% {8305ms} StandardFileStream>>skip:
          3.4% {5647ms} BinaryStreamReader>>int8
            2.6% {4319ms} BinaryStreamReader>>uint8

**Leaves**
25.4% {42189ms} StandardFileStream>>basicNext
25.2% {41857ms} StandardFileStream>>next
6.0% {9966ms} BinaryStreamReader>>uint32
5.6% {9302ms} SmallInteger(Number)>>negative
4.6% {7641ms} LargePositiveInteger>>+
3.8% {6312ms} LargePositiveInteger(Integer)>>+
3.8% {6312ms} BinaryStreamReader>>uint16
3.4% {5647ms} Float class(Behavior)>>new:
2.0% {3322ms} BinaryStreamReader>>double

**Memory**
        old                     +3,705,004 bytes
        young           -28,800 bytes
        used            +3,676,204 bytes
        free            +362,744 bytes

**GCs**
        full                    50 totalling 2,524ms (2.0% uptime), avg 50.0ms
        incr            19959 totalling 2,794ms (2.0% uptime), avg 0.0ms
        tenures         6,041 (avg 3 GCs/tenure)
        root table      0 overflows




-- 
David Finlayson, Ph.D.
Operational Geologist

U.S. Geological Survey
Pacific Science Center
400 Natural Bridges Drive
Santa Cruz, CA 95060, USA

Tel: 831-427-4757, Fax: 831-427-4748, E-mail: [EMAIL PROTECTED]
_______________________________________________
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners

Reply via email to