date:20120608

[CODE4LIB] Invitation and Call for Proposals: Innovation in Libraries 2012, a Free Postconference to LITA Forum

2012-06-08 Thread Schurr,Andrea

Innovation in Libraries 2012 A Free Post Conference event after LITA Forum Invitation and Call for Proposals Do you love exploring new ideas? Always secretly wished you knew more about how to create an app? Wonder what the next wave of library innovation might be? If you answered yes, then

[CODE4LIB] The history of Code4Lib and MediaWiki development.

2012-06-08 Thread Klein,Max

Hello Silicon Sorcerers, I was just wondering if there have been any efforts from Code4Lib into MediaWiki development? I know that there have been some Wikipedia templates and bots designed to interface with library services. Yet what about cold hard MediaWiki extensions? Has there been any

[CODE4LIB] Best way to process large XML files

2012-06-08 Thread Kyle Banerjee

I'm working on a script that needs to be able to crosswalk at least a couple hundred XML files regularly, some of which are quite large. I've thought of a number of ways to go about this, but I wanted to bounce this off the list since I'm sure people here deal with this problem all the time. My

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Ethan Gruber

Saxon is really, really efficient with large files. I don't really have any benchmarks stats available, but I have gotten noticeably better performance from Saxon/XSLT2 than PHP with DOMDocument or SimpleXML or nokogiri and hpricot in Ruby. Ethan On Fri, Jun 8, 2012 at 2:36 PM, Kyle Banerjee

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Reese, Terry

I would really consider SAX. In MarcEdit, I had originally utilized an XSLT process for handling MARCXML translations (using both SAXON and MSXML parsers) -- but as you noticed -- there ends up being an upper limit to what you can process. The break point for me was when working with some

[CODE4LIB] LITA Mobile Computing IG meeting at ALA Annual 2012, Anaheim, CA

2012-06-08 Thread Bohyun Kim

apologies for the cross-posting *** * The IG meeting also will be looking for an IG chair volunteer and discuss any programs the IG members are interested in organizing/offering at the next ALA Annual Conference. More details : http://connect.ala.org/node/176080 . ** *LITA Mobile Computing

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread LeVan,Ralph

I create 50GB files of marcxml all the time. We do NOT put a wrapper element around them, but do put a line feed at the end of each record. Then a trivial line reading loop in java/perl/whatever can read those records individually and process them appropriately. That turns out the be the right

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Devon

This is something I've dealt with. And for a variety of reasons, we went with the streaming parser. I'm not sure about the quality of your data, but we have to be prepared for seriously messed up data. There was no way I was going to develop a process that would try to load a 15 million record

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Joe Hourcle

On Jun 8, 2012, at 2:36 PM, Kyle Banerjee wrote: I'm working on a script that needs to be able to crosswalk at least a couple hundred XML files regularly, some of which are quite large. [trimmed] How do you guys deal with large XML files? Thanks, um ... I return ASCII tab-delim records,

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Steve Meyer

It is also worth noting that you can usually do SAX-style parsing in most XML parsing libraries that are normally associated with DOM style parsing and conveniences like XPath selectors. For example, Nokogiri does SAX and it is *very* fast: http://nokogiri.org/Nokogiri/XML/SAX/Document.html As a

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Esmé Cowles

One way to get the best of both worlds (scalability of a streaming parser, but convenience of DOM) is to use DOM4J's ElementHandler interface[1]. You parse the XML file using a SAXReader, and register a class to handle callbacks, based on an XPath expression. I used this approach to break up

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Kevin S. Clarke

If you're not adverse to Java, the XOM XML library has a nice NodeFactory class that you can override and control the processing of the XML document. For instance, it will let you parse a very large XML document like root rec/rec rec/rec ... /root only keeping a rec at a time in memory.

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Walker, David

Since you mentioned SimpleXML, Kyle, I assume you're using PHP? If so, you might look at XMLReader [1], which is a pull parser, and should give you better performance on large files than SimpleXML . It is still based on libxml, though, so if that is still not fast enough for you, you can

[CODE4LIB] Job: Digital Services Librarian at Loyola University Chicago

2012-06-08 Thread jobs

The Digital Services Librarian provides expertise in creating and managing library digital collections, such as digital special collections, electronic theses and dissertations, and other born-digital or retrospectively digitized materials. This position assumes primary responsibility for the

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Kyle Banerjee

Since you mentioned SimpleXML, Kyle, I assume you're using PHP? Actually I'm using perl. For reasons not related to XML parsing, it is the preferred (but not mandatory) language. Based on a few tests and manual inspection, it looks like the ticket for me is going have a two stage process

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Peter Murray

*sigh* -- I kinda wish this whole discussion got captured in http://libraries.stackexchange.com/ ... Peter On Jun 8, 2012, at 2:36 PM, Kyle Banerjee wrote: I'm working on a script that needs to be able to crosswalk at least a couple hundred XML files regularly, some of which are quite large.

Re: [CODE4LIB] The history of Code4Lib and MediaWiki development.

2012-06-08 Thread Peter Murray

One tangent that I know about is the Memento work: https://www.mediawiki.org/wiki/Extension:Memento Peter On Jun 8, 2012, at 2:18 PM, Klein,Max wrote: Hello Silicon Sorcerers, I was just wondering if there have been any efforts from Code4Lib into MediaWiki development? I know that

[CODE4LIB] Invitation and Call for Proposals: Innovation in Libraries 2012, a Free Postconference to LITA Forum

[CODE4LIB] The history of Code4Lib and MediaWiki development.

[CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] Best way to process large XML files

[CODE4LIB] LITA Mobile Computing IG meeting at ALA Annual 2012, Anaheim, CA

Re: [CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] Best way to process large XML files

[CODE4LIB] Job: Digital Services Librarian at Loyola University Chicago

Re: [CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] Best way to process large XML files

Re: [CODE4LIB] The history of Code4Lib and MediaWiki development.

17 matches

Site Navigation

Mail list logo

Footer information