Re: [Dspace-tech] standards to facilitate metadata extraction during text extraction

2008-12-15 Thread François Parmentier
During my PhD, this was still a research subject (automatic extraction of
data from physical structure of a document).
Have a look at http://www.loria.fr/equipes/read/
I don't know whether there have been free or proprietary systems since then.

When the layout of your documents is a regular one, some rather simple
process may be useful, but if it varies too much, it is a much more
complicated task!
--
François PARMENTIER / INIST-CNRS

On Sun, Dec 14, 2008 at 12:52 AM, Andrew Marlow 
marlow.and...@googlemail.com wrote:

 This may seem like a crazy or naive question, but is there any standard
 laid down by publishers or societies that authors must adhere to so that the
 extraction of metadata from articles can be easily automated? Having just
 performed a text extraction on a non-searchable PDF I see that there is no
 easy way to get any metadata out. But if a society had conventions for the
 layour of the article, specifying location and format of title, authors,
 abstract, bibliography etc, then it might be possible. I have seen a very
 regular visual layout in the PDFs from some places. Using OCR techniques it
 might be possible to locate blocks of interest. It might also be possible
 from a text extraction but that might be harder since all visual layout
 information is gone (at least it was with the tool I used). I wonder if this
 is being considered by anyone. I am very new to this area so please excuse
 me if this seems like a silly question.
 --
 Regards,

 Andrew M.


 --
 SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
 The future of the web can't happen without you.  Join us at MIX09 to help
 pave the way to the Next Web now. Learn more and register at

 http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Citation format

2008-12-15 Thread Robin Taylor
You need to add new terms to the metadata registry using the dspace-admin 
function, and then add those new terms to input-forms.xml to make them appear 
on the html forms. That bit is fairly easy and is a typical thing that a Dspace 
admin would do, the more messy part is that there is not any widely recognised 
metadata schema that will allow you to store this information, at least not in 
the context of Dspace. I think that most people with a similar requirement make 
up their own terms and possibly schema eg. mine:volume, mine:issue, etc. The 
next tricky bit is that the Dspace submission process is not 'type' based so 
how do you request the appropriate information dependant on type ? You can 
either just not think about type and always ask the user for all citation data 
even when not appropriate (eg volume/issue for a book?), or you can split your 
collections by type as input-forms.xml allows you to define different forms for 
different collections.

Cheers, Robin.



Robin Taylor
Main Library
University of Edinburgh
Tel. 0131 6515208  

 -Original Message-
 From: juuventud [mailto:s.m...@ru.ac.za] 
 Sent: 13 December 2008 17:55
 To: dspace-tech@lists.sourceforge.net
 Subject: [Dspace-tech] Citation format
 
 
 Hi all
 
 In order to have the proper citation format for different 
 types of documents, eg. journals, books, book chapters, etc, 
 I need to have fields where I can enter things like Journal 
 Name (not publisher), Volume number, Part number, Pagination 
 (eg. pg142 - pg163), etc.
 I don't know how to do this. Is there a standard set of 
 submission pages with these fields already available? If not, 
 where can I create such forms and how do I create the link 
 from the form to specific fields in the database?
 Any help would be GREATLY APPRECIATED.
 
 I'm using DSpace 1.4.2 with PostgreSQL 8.1 on Windows Server 2003.
 
 Many thanks in advance
 --
 View this message in context: 
 http://www.nabble.com/Citation-format-tp20992675p20992675.html
 Sent from the DSpace - Tech mailing list archive at Nabble.com.
 
 
 --
 
 SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las 
 Vegas, Nevada.
 The future of the web can't happen without you.  Join us at 
 MIX09 to help pave the way to the Next Web now. Learn more 
 and register at 
 http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009
 .visitmix.com/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 
 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] HANDLE update issue

2008-12-15 Thread McGee, Thomas A.
Following up on my question of last week on the failed UPDATE queries for 
changing handles on an Oracle database, this is what happened.

Stuart Lewis suggested that the failing query was this:
UPDATE metadatavalue SET text_value= (SELECT 'http://hdl.handle.net/' || handle 
FROM handle WHERE handle.resource_id=item_id AND
handle.resource_type_id=2) WHERE  text_value LIKE 'http://hdl.handle.net/%';

When I ran that in the Oracle SQL Developer application, I got an error 
something like no statement at cursor. I simply deleted the semicolon and ran:

UPDATE metadatavalue SET text_value= (SELECT 'http://hdl.handle.net/' || handle 
FROM handle WHERE handle.resource_id=item_id AND
handle.resource_type_id=2) WHERE  text_value LIKE 'http://hdl.handle.net/%'

Which worked. Could that really be all that it is?

_
Tom McGee
Senior Digital Media Specialist
Seton Hall University
400 South Orange Ave., South Orange, NJ 07079
973.275.2992



--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] HANDLE update issue

2008-12-15 Thread Stuart Lewis [sdl]
Hi Tom,

 

Would you mind trying an experiment for us? If you remove the semicolon
from
[dsapce-src]/dspace-api/src/main/java/org/dspace/handle/UpdateHandlePref
ix.java run mvn package, and ant update, does the script
[dspace]/bin/update-hanmdle-prefix then run OK?

 

I suspect it will fix the problem.

 

If you could confirm this, we'll get it fixed ready for the next release
of DSpace.

 

Thanks,

 

 

Stuart

 

From: McGee, Thomas A. [mailto:thomas.mc...@shu.edu] 
Sent: 15 December 2008 14:56
To: dspace-tech@lists.sourceforge.net
Subject: [Dspace-tech] HANDLE update issue

 

Following up on my question of last week on the failed UPDATE queries
for changing handles on an Oracle database, this is what happened. 

 

Stuart Lewis suggested that the failing query was this: 

UPDATE metadatavalue SET text_value= (SELECT 'http://hdl.handle.net/' ||
handle FROM handle WHERE handle.resource_id=item_id AND

handle.resource_type_id=2) WHERE  text_value LIKE
'http://hdl.handle.net/% http://hdl.handle.net/%25 ';

 

When I ran that in the Oracle SQL Developer application, I got an error
something like no statement at cursor. I simply deleted the semicolon
and ran:

 

UPDATE metadatavalue SET text_value= (SELECT 'http://hdl.handle.net/' ||
handle FROM handle WHERE handle.resource_id=item_id AND

handle.resource_type_id=2) WHERE  text_value LIKE
'http://hdl.handle.net/% http://hdl.handle.net/%25 '

 

Which worked. Could that really be all that it is?

 

_

Tom McGee

Senior Digital Media Specialist

Seton Hall University

400 South Orange Ave., South Orange, NJ 07079

973.275.2992

 

 

 

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] standards to facilitate metadata extraction during text extraction

2008-12-15 Thread Christophe Dupriez
End of the 1990s, I used MS-Word forms and macros to allow authors to 
enter metadata together with their articles. Even references were 
structured.


It seemed a good idea (normalizing upfront).

It ended up very badly because:
* MacIntosh MS-Word was not compatible for forms and macros;
* Word Perfect was still popular and presented as being compatible 
(which was not true for forms and macros);
 The worse was one of the revisors who opened most of the articles in 
Word Perfect and saved them after comments addition...
* Asian versions of Word were introducing unknown characters for Western 
versions;

* About a quarter of the authors did not understood the form.
Those (technical?) problems produced a terrible mess which took very 
long to correct and delayed the publication of the paper.


Efficient cataloguers (possibly with the help of a submission form like 
the DSpace one + a better cataloguing form than the current one) will be 
always better than machine to tame the authors' diversity!


Have a nice day!

Christophe Dupriez

François Parmentier a écrit :
During my PhD, this was still a research subject (automatic extraction 
of data from physical structure of a document).

Have a look at http://www.loria.fr/equipes/read/
I don't know whether there have been free or proprietary systems since 
then.


When the layout of your documents is a regular one, some rather simple 
process may be useful, but if it varies too much, it is a much more 
complicated task!

--
François PARMENTIER / INIST-CNRS

On Sun, Dec 14, 2008 at 12:52 AM, Andrew Marlow 
marlow.and...@googlemail.com mailto:marlow.and...@googlemail.com 
wrote:


This may seem like a crazy or naive question, but is there any
standard laid down by publishers or societies that authors must
adhere to so that the extraction of metadata from articles can be
easily automated? Having just performed a text extraction on a
non-searchable PDF I see that there is no easy way to get any
metadata out. But if a society had conventions for the layour of
the article, specifying location and format of title, authors,
abstract, bibliography etc, then it might be possible. I have seen
a very regular visual layout in the PDFs from some places. Using
OCR techniques it might be possible to locate blocks of interest.
It might also be possible from a text extraction but that might be
harder since all visual layout information is gone (at least it
was with the tool I used). I wonder if this is being considered by
anyone. I am very new to this area so please excuse me if this
seems like a silly question.
-- 
Regards,


Andrew M.


--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las
Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09
to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
mailto:DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech




--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/


___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  


begin:vcard
fn:Christophe Dupriez
n:Dupriez;Christophe
org:DESTIN inc. SSEB
adr;quoted-printable:;;rue des Palais 44, bo=C3=AEte 1;Bruxelles;;B-1030;Belgique
email;internet:christophe.dupr...@destin.be
title:Informaticien
tel;work:+32/2/216.66.15
tel;fax:+32/2/242.97.25
tel;cell:+32/475.77.62.11
note;quoted-printable:D=C3=A9veloppement de Syst=C3=A8mes de Traitement de l'Information
x-mozilla-html:TRUE
url:http://www.destin.be
version:2.1
end:vcard

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] Controlled Vocabulary Filtering

2008-12-15 Thread Hilderbrand, Barbara (GSFC-999.9)[PEND]
Hello,
 
I have recently uploaded a controlled vocabulary into Dspace and it is
working with the submission forms searching within the metadata.  The
problems is that when I go to subject search and try to filter my
controlled vocabulary I just get a blank page.  I've been researching if
there is something I have to do to set up the filter but I can't find
anything.  Can anyone offer assistance?
 
Thank you in advance,
 
Barbara
 
Barbara Yates Hilderbrand, MLS
Metadata/Digital Collections Librarian, Library Associates
NASA/GSFC Library
Code 272, Building 21
Greenbelt, MD 20771
301-286-6246
barbara.y.hilderbr...@nasa.gov

 
--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] standards to facilitate metadata extraction duringtext extraction

2008-12-15 Thread Robin Taylor
I don't think it's a daft question at all, but then I am known to ask some very 
daft ones myself :)  

I think the problem is that we wrap the data up in formats that make extraction 
difficult and then need to go to great lengths to try and extract that data. I 
don't know of any widely used, reliable methos as yet. Better to move towards 
formats that make extraction easy. Microsoft docx documents looks like a step 
in the right direction to me. It's a normal Word document but is stored as xml 
and hence is readable programatically. In addition the author can add their own 
tags, so there is no reason why they should not tag the abstract, references, 
etc. In theory it should be easy to then extract that information.  

I'm sure there are good reasons why we all favour pdf's but I think the 
principle still applies.

Cheers, Robin.




Robin Taylor
Main Library
University of Edinburgh
Tel. 0131 6515208  

 -Original Message-
 From: Andrew Marlow [mailto:marlow.and...@googlemail.com] 
 Sent: 13 December 2008 23:53
 To: dspace-tech@lists.sourceforge.net
 Subject: [Dspace-tech] standards to facilitate metadata 
 extraction duringtext extraction
 
 This may seem like a crazy or naive question, but is there 
 any standard laid down by publishers or societies that 
 authors must adhere to so that the extraction of metadata 
 from articles can be easily automated? Having just performed 
 a text extraction on a non-searchable PDF I see that there is 
 no easy way to get any metadata out. But if a society had 
 conventions for the layour of the article, specifying 
 location and format of title, authors, abstract, bibliography 
 etc, then it might be possible. I have seen a very regular 
 visual layout in the PDFs from some places. Using OCR 
 techniques it might be possible to locate blocks of interest. 
 It might also be possible from a text extraction but that 
 might be harder since all visual layout information is gone 
 (at least it was with the tool I used). I wonder if this is 
 being considered by anyone. I am very new to this area so 
 please excuse me if this seems like a silly question.
 --
 Regards,
 
 Andrew M.
 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Citation format

2008-12-15 Thread juuventud

Thanks a lot Robin. Question answered.





...
-- 
View this message in context: 
http://www.nabble.com/Citation-format-tp20992675p21013005.html
Sent from the DSpace - Tech mailing list archive at Nabble.com.


--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] standards to facilitate metadata extraction duringtext extraction

2008-12-15 Thread Andrew Marlow
On Mon, Dec 15, 2008 at 9:36 AM, Robin Taylor robin.tay...@ed.ac.uk wrote:

 I don't think it's a daft question at all, but then I am known to ask some
 very daft ones myself :)

 I think the problem is that we wrap the data up in formats that make
 extraction difficult and then need to go to great lengths to try and extract
 that data. I don't know of any widely used, reliable methos as yet. Better
 to move towards formats that make extraction easy. Microsoft docx documents
 looks like a step in the right direction to me.


No, no, no, please let us not use formats invented by Microsoft. We need
open formats not closed-secret-proprietary ones. And if Microsoft claim it
is open we must not believe them. Just look at their track record. I realise
that PDFs are not completely open either but they are bound to be more open
than anything Microsoft produce. And I was talking about PDFs.

But I do not want the discussion to focus on file formats. As I said
originally,

 But if a society had
 conventions for the layout of the article, specifying
 location and format of title, authors, abstract, bibliography
 etc, then it might be possible


 -Original Message-
  From: Andrew Marlow [mailto:marlow.and...@googlemail.com]
  Sent: 13 December 2008 23:53
  To: dspace-tech@lists.sourceforge.net
  Subject: [Dspace-tech] standards to facilitate metadata
  extraction duringtext extraction
 
  This may seem like a crazy or naive question, but is there
  any standard laid down by publishers or societies that
  authors must adhere to so that the extraction of metadata
  from articles can be easily automated? Having just performed
  a text extraction on a non-searchable PDF I see that there is
  no easy way to get any metadata out. But if a society had
  conventions for the layour of the article, specifying
  location and format of title, authors, abstract, bibliography
  etc, then it might be possible. I have seen a very regular
  visual layout in the PDFs from some places. Using OCR
  techniques it might be possible to locate blocks of interest.
  It might also be possible from a text extraction but that
  might be harder since all visual layout information is gone
  (at least it was with the tool I used). I wonder if this is
  being considered by anyone.

-- 
Regards,

Andrew M.
--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] Ingest information in Dspace

2008-12-15 Thread yo
Hello!   :-)

I need to ingest some information to my university Dspace. According  
to the information provided by dspace  
http://www.dspace.org/index.php/Architecture/technology/metadata.html  
, all that information can be uploaded.  I am using this example to  
upload some information about publications (Author, Abstract,  
language) from  http://www.ukoln.ac.uk/repositories/sword/example.zip

The problem is that I cannot find the way to ingest other needed  
information such as  ISBN , govdoc, issn, ismn ...
Have any of you got a xml example about how to upload metadata such as :
dc.contributor.author   dc.contributor.author   dc.date.accessioned   
dc.date.available dc.date.issueddc.identifier.citation   
dc.identifier.govdoc   dc.identifier.isbn dc.identifier.issn  
dc.identifier.pmid   dc.identifier.doi   
dc.identifier.other dc.identifier.uri dc.identifier.uri   
dc.description  dc.description.abstract dc.language.iso  dc.publisher   
dc.relation.ispartofseries dc.relation.ispartofseries
dc.relation.ispartofseries  dc.relation.ispartofseries   
dc.relation.url dc.relation.url dc.subject   dc.subject dc.subject   
dc.subject  dc.subject.mesh dc.subject.mesh dc.subject.mesh   
dc.subject.mesh dc.subject.other   dc.subject.other  dc.title
dc.title.alternative dc.title.alternative  dc.type
dc.contributor.department   dc.identifier.journal  dc.identifier.pmcid

Thanks a lot to all of you for your help,

Javier Espinosa de los Monteros.
University of Wolverhampton


--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] standards to facilitate metadata extraction duringtext extraction

2008-12-15 Thread Mark H. Wood
On Mon, Dec 15, 2008 at 09:36:11AM +, Robin Taylor wrote:
 I think the problem is that we wrap the data up in formats that make
 extraction difficult and then need to go to great lengths to try and
 extract that data. I don't know of any widely used, reliable methos as
 yet. Better to move towards formats that make extraction easy.

Most common formats other than plain text have some sort of tagging
feature.  In some cases, few know about them so they aren't much
used.  That could be fixed easily.

 Microsoft docx documents looks like a step in the right direction to
 me. It's a normal Word document but is stored as xml and hence is
 readable programatically.

The older Office formats are readable programmatically too.  More
readable, actually, since OOXML is very new, still only partially
documented, and not implemented anywhere, even at Microsoft.  There's
a store for document attributes inside the traditional Office format's
bag.  There's a nice Java library (POI) that can extract them.

But then that only works for MS Office documents.  Not for OpenOffice
or Symphony.  Not for Acrobat.  We have tens of thousands of PDFs.  We
have audio and video streams waiting in the wings.

And we still need a system for assigning meanings to the tags.

   In addition the author can add their own
 tags, so there is no reason why they should not tag the abstract,
 references, etc. In theory it should be easy to then extract that
 information.

See the subject line.  If everybody makes up his own tags then there
is no standard, and software cannot make use of the tags without being
told, for each individual provider's profile, what to look for and
what they mean.  Bibliographic software like EndNote shows us what we
wind up with: hundreds of format modules to be maintained.  We can do
that but I'd rather have something systematic.  (BTW EndNote or one of
its brethren might be able to serve the original request.)

If there is no standard now, then maybe it's up to the document
repository community (that's us) to lay the groundwork for some
standardization and champion the idea until it's accepted.

-- 
Mark H. Wood, Lead System Programmer   mw...@iupui.edu
Friends don't let friends publish revisable-form documents.


pgpFXdB0KGzKu.pgp
Description: PGP signature
--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] structure import problem with French and German accented characters

2008-12-15 Thread Andrew Marlow
I have created an XML file for a structure import, based on a CSV file I
have of journal titles. I am converting the CSV to XML using a bit of perl.
Everything is fine until I introduce journal titles that contain accented
characters. For example, one title contains the German word 'fur' with u
umlaut. I get a UTF-8 error if I leave it like that. So in my XML file I
change this for uuml; but it doesn't work. It says 'the entity uuml was
referenced but not declared'. What is going wromg please? How may titles
with accented characters be imported?
-- 
Regards,

Andrew M.
--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] how to zap the DSpace database quickly and start again

2008-12-15 Thread Andrew Marlow
Now that I have done several experiments with bulk imports using
StructBuilder, my DSpace database is full of rubbish. Can anyone please tell
me what is the best way to zap the database so I can start again? I don't
want to do a complete reinstall of DSpace, I would lose config info and
XMLUI customisations that I want to keep.

-- 
Regards,

Andrew M.
--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] structure import problem with French and German accented characters

2008-12-15 Thread Walker, David
Hi Andrew,

I can't speak to the general question.  But on this point . . .

   uuml;

. . . is an HTML character entity reference, and is not recognized within XML 
documents in general.  To use this, you would need, as the XML parser here is 
saying, a supporting DTD entity reference declaration.

Easier, I think, maybe just to reference it using it's numeric value:

   #252;

--Dave


==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu


From: Andrew Marlow [marlow.and...@googlemail.com]
Sent: Monday, December 15, 2008 2:05 PM
To: dspace-tech@lists.sourceforge.net
Subject: [Dspace-tech] structure import problem with French and German accented 
characters


I have created an XML file for a structure import, based on a CSV file I have 
of journal titles. I am converting the CSV to XML using a bit of perl. 
Everything is fine until I introduce journal titles that contain accented 
characters. For example, one title contains the German word 'fur' with u 
umlaut. I get a UTF-8 error if I leave it like that. So in my XML file I change 
this for uuml; but it doesn't work. It says 'the entity uuml was referenced 
but not declared'. What is going wromg please? How may titles with accented 
characters be imported?
--
Regards,

Andrew M.

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] StructBuilder says I am not authorized

2008-12-15 Thread Andrew Marlow
I am now having trouble running the StructBuilder. Here is the error I get:

Using DSpace installation in:
G:\mystuff\tools\dspace-1.5.1-src-release\dspace\target\dspace-1.5.1-build.dir
Exception in thread main org.dspace.authorize.AuthorizeException: Only
administrators can create communities
at org.dspace.content.Community.create(Community.java:193)

This used to work! The only thing I have done different is I blew away the
database because previous imports filled it with rubbish. I stopped tomcat,
dropped the database, said ant fresh_install and restarted tomcat. I must
have missed off something, but what?
-- 
Regards,

Andrew M.
--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] StructBuilder says I am not authorized

2008-12-15 Thread James Rutherford
On Mon, Dec 15, 2008 at 11:17:45PM +, Andrew Marlow wrote:
 I am now having trouble running the StructBuilder. Here is the error I get:
 
 Using DSpace installation in: 
 G:\mystuff\tools\dspace-1.5.1-src-release\dspace\target\dspace-1.5.1-build.dir
 Exception in thread main org.dspace.authorize.AuthorizeException: Only 
 administrators can create communities
 at org.dspace.content.Community.create(Community.java:193)
 
 This used to work! The only thing I have done different is I blew away
 the database because previous imports filled it with rubbish. I
 stopped tomcat, dropped the database, said ant fresh_install and
 restarted tomcat. I must have missed off something, but what?

Sounds like you forgot to re-create the administrator account (see the
install instructions).

cheers,

Jim

-- 
James Rutherford  |  Hewlett-Packard Limited registered Office:
Research Engineer |  Cain Road,
HP Labs   |  Bracknell,
Bristol, UK   |  Berks
+44 117 312 7066  |  RG12 1HN.
james.rutherf...@hp.com   |  Registered No: 690597 England

The contents of this message and any attachments to it are confidential
and may be legally privileged. If you have received this message in
error, you should delete it from your system immediately and advise the
sender. To any recipient of this message within HP, unless otherwise
stated you should consider this message and attachments as HP
CONFIDENTIAL.

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] StructBuilder says I am not authorized

2008-12-15 Thread Claudia Jürgen
Hi Andrew,

as you dropped the database and did a fresh_install, there is no 
administrator anymore. Run [dspace]/bin/create_administrator and try the 
  structure-builder with this eperson.

Hope that helps

Claudia Jürgen


Andrew Marlow schrieb:
 I am now having trouble running the StructBuilder. Here is the error I get:
 
 Using DSpace installation in:
 G:\mystuff\tools\dspace-1.5.1-src-release\dspace\target\dspace-1.5.1-build.dir
 Exception in thread main org.dspace.authorize.AuthorizeException: Only
 administrators can create communities
 at org.dspace.content.Community.create(Community.java:193)
 
 This used to work! The only thing I have done different is I blew away the
 database because previous imports filled it with rubbish. I stopped tomcat,
 dropped the database, said ant fresh_install and restarted tomcat. I must
 have missed off something, but what?
 
 
 
 
 --
 SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
 The future of the web can't happen without you.  Join us at MIX09 to help
 pave the way to the Next Web now. Learn more and register at
 http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
 
 
 
 
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech