Hi,
I just started with cl-pdf and it works great for me :)
but I found some problems in pdf-parser and need advice
how to fix it. I am rather novice Lisper so I can be wrong
in my guesses below...


1. In file cl-pdf, function find-cross-reference-start

function searches for 'startxref' in buffer _from beginning_
and can find incorrect place if at end of file (in buffer)
are two such sections (eg small incremental change at end of file).

Proposition: change

    (let ((position (search "startxref" buffer)))

to

    (let ((position (search "startxref" buffer :from-end t)))



2. In file cl-pdf, function make-indirect-object:

(defun make-indirect-object (obj-number gen-number position)
(let ((object (or (car (gethash (cons obj-number gen-number) *indirect-objects*))
                    (make-instance 'indirect-object
                                   :obj-number obj-number
                                   :gen-number gen-number
                                   :content :unread
                                   :no-link t))))
(setf (gethash (cons obj-number gen-number) *indirect-objects*) (cons object position))
    object))

I am working on file generated from Adobe Acrobat Distiller
and then cropped in Adobe Acrobat so at end of file there are
few modified objects with duplicate numbers (and generations �
whih is maybe bug in Acrobat?). When indirect-object objects
are read from file (in order from cross reference tables which
a read from newest to oldest) then newer one are overwritten
by older one with the same number. We end with readable pdf
but with some object revisions dropped.

I have added some print for debuggind in above function (and some
others) and for sample file got such a reading order:

startxref position: 89502
xref position: 89502
making obj: 4 0 position 85386
making obj: 5 0 position 89106
making obj: 8 0 position 89309
making obj: 7 0 position 0
xref position: 116
making obj: 6 0 position 16
making obj: 7 0 position 1150
making obj: 8 0 position 1227
making obj: 9 0 position 1411
making obj: 10 0 position 1554
(..)
making obj: 37 0 position 936
xref position: 85210
making obj: 1 0 position 81250
making obj: 2 0 position 81284
making obj: 3 0 position 81308
making obj: 4 0 position 81359
making obj: 5 0 position 85007

Which shows that in file are 4 duplicated objects and
they are overwritten by older versions (4 0, 5 0, 8 0, 7 0).


I think that solution would be to drop older objects when
newer wersion with the same number and generation was already read?
Something like this:

(defun make-indirect-object (obj-number gen-number position)
  (let ((object (gethash (cons obj-number gen-number) *indirect-objects*)))
    (if object
        (progn
(format T "obj alredy present: ~s ~s at position ~s (dropped older one at position ~s)~%"
                  obj-number gen-number
                  (cdr object) position)
          (car object))
        (progn
(format T "making obj: ~s ~s position ~s ~%" obj-number gen-number position)
          (let ((new-object (make-instance 'indirect-object
                                           :obj-number obj-number
                                           :gen-number gen-number
                                           :content :unread
                                           :no-link t)))
(setf (gethash (cons obj-number gen-number) *indirect-objects*) (cons new-object position))
            new-object)))))

Which gives on the same example file

startxref position: 89502
xref position: 89502
making obj: 4 0 position 85386
making obj: 5 0 position 89106
making obj: 8 0 position 89309
making obj: 7 0 position 0
xref position: 116
making obj: 6 0 position 16
obj alredy present: 7 0 at position 0 (dropped older one at position 1150)
obj alredy present: 8 0 at position 89309 (dropped older one at position 1227)
making obj: 9 0 position 1411
making obj: 10 0 position 1554
(...)
making obj: 37 0 position 936
xref position: 85210
making obj: 1 0 position 81250
making obj: 2 0 position 81284
making obj: 3 0 position 81308
obj alredy present: 4 0 at position 85386 (dropped older one at position 81359) obj alredy present: 5 0 at position 89106 (dropped older one at position 85007)



But this reveals another problem in read-xref-and-trailer

(defun read-xref-and-trailer (position)
  (let (first-trailer)
    (loop
       (format T "xref position: ~s~%" position)
       (read-cross-reference-subsections position)
       (let* ((trailer (read-trailer)))
         (unless first-trailer (setf first-trailer trailer))
         (let ((prev-position (get-dict-value trailer "/Prev")))
           (if prev-position
               (setq position prev-position)
               (return first-trailer)))))))

If I correctly read it, it reads trailers from most recent to older
and returns oldest instead of first read? So in read-pdf document gets incorrect information.

Can someone rewiew above and tell me if I search in good direction
or I am entirely wrong...


--
pozdrawiam
Piotr Chamera

_______________________________________________
cl-pdf-devel site list
cl-pdf-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/cl-pdf-devel

Reply via email to