Forgot to mention some things:

- https://github.com/alamar/clojure-xml-stream on github.
- I'm yet to figure out that Lenin thing, so ant.
- The two-step handler system (there's a function that takes a method
and returns a handler, and handler accepts item being constructed and
stream-reader) seems suboptimal, maybe I'll figure it out later.

On 31 май, 23:25, Ilya Kasnacheev <ilya.kasnach...@gmail.com> wrote:
> Hi *! I've tried a few searches on parsing XML files larger than
> memory, didn't find anything and wrote a simple framework for parsing
> XML via STAX to lazy sequence of defrecords. It is therefore capable
> of reading several GB of xml without much problems. It is quite
> declarative but also quite ugly.
>
> Take a peek:
> (technical babble after the fold)
>
> $ git clone git://github.com/alamar/clojure-xml-stream.git
> $ ant
>
> It turns this completely-invented XML:
>
> <ground>
>   <tree-species>
>     <tree id="1"><name>Pine</name></tree>
>     <tree id="2"><name>Birch</name></tree>
>     <tree id="4"><name>Palmtree</name></tree>
>   </tree-species>
>   <forests>
>     <forest id="1">
>       <name>Red Forest</name>
>       <trees>
>         <tree refid="1"><branch direction="left"/><branch
> direction="south"/></tree>
>         <tree refid="2"><branch direction="right"/><branch
> direction="south"/><branch direction="west"/></tree>
>         <tree refid="1"><branch direction="southwest"/></tree>
>       </trees>
>     </forest>
>     <forest id="2">
>       <name>Dark Forest</name>
>       <trees>
>         <tree refid="2"><branch direction="right"/><branch
> direction="south"/><branch direction="west"/></tree>
>         <tree refid="4"><branch direction="northwest"/></tree>
>       </trees>
>     </forest>
>   </forests>
> </ground>
>
> into a lazy sequence of:
>
>  #:example.TreeSpecies{:id 1, :name Pine}
>  #:example.TreeSpecies{:id 2, :name Birch}
>  #:example.TreeSpecies{:id 4, :name Palmtree}
>  #:example.Forest{:id 1, :trees [#:example.Tree{:species-id
> 1, :branches (:left :south)} #:example.Tree{:species-id 2, :branches
> (:right :south :west)} #:example.Tree{:species-id 1, :branches
> (:southwest)}], :name Red Forest}
>  #:example.Forest{:id 2, :trees [#:example.Tree{:species-id
> 2, :branches (:right :south :west)} #:example.Tree{:species-id
> 4, :branches (:northwest)}], :name Dark Forest}
>
> using this code:
>
> (defrecord TreeSpecies [id name])
> (defrecord Forest [id trees name])
> (defrecord Tree [species-id branches])
>
> (defmulti ground-element first-arg)
>
> (defmulti tree-element first-arg)
>
> (defmethod ground-element :tree [_ stream-reader]
>   (TreeSpecies. (attribute-value stream-reader "id") nil))
>
> (defmethod ground-element [:TreeSpecies :name] [_ stream-reader tree]
>   (assoc tree :name (element-text stream-reader)))
>
> (defmethod ground-element :forest [_ stream-reader]
>   (Forest. (attribute-value stream-reader "id") [] nil))
>
> (defmethod ground-element [:Forest :name] [_ stream-reader forest]
>   (assoc forest :name (element-text stream-reader)))
>
> (defmethod ground-element [:Forest :tree] [_ stream-reader forest]
>   (assoc forest :trees
>     (conj (:trees forest)
>       (Tree. (attribute-value stream-reader "refid")
>         (dispatch-partial stream-reader
>           (element-struct-handler tree-element))))))
>
> (defmethod tree-element :branch [_ stream-reader]
>   (keyword (attribute-value stream-reader "direction")))
>
> (defmethod ground-element :default [token & whatever]
>   (comment println token))
> (defmethod tree-element :default [token & whatever]
>   (comment println token))
>
> (defn run [path]
>   (with-open [input-stream (FileInputStream. path)]
>     (let [handler (element-struct-handler ground-element)
>           objects (parse-dispatch input-stream handler)]
>       (doseq [object objects] (println object)))))
>
> How it works: it reads elements and calls a method with
> the :ElementName
> If the method returns a record, it stuffs anything found in that
> element into this record.
> It can handle nested structures because it can parse subtrees (there
> is an example in code).
> The handler have to read events from stax (to get text nodes, for
> example), the only limitation is that handler should never iterate
> past END_ELEMENT of the element it was called on (or the parser would
> become confused).
>
> The syntax and the way I call assoc seem ugly to me, so all
> suggestions are welcome.
> Suggestions on naming and general architecture are welcome too.
> Maybe this can grow into something generally usable.
>
> Feel free to fork, use and complain.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to