Re: [Dspace-tech] [Dspace-general] How to implements Lucene

2013-01-28 Thread helix84
Hi Álvaro,

first of all, please, don't cross-post. When you reply to this mail,
please don't include dspace-general. This question belongs to
dspace-tech, so let's continue the conversation there.

Your question is not entirely clear. If you want to build a search
engine (also known as a central index in the library world) using
Lucene, why are you asking at a DSpace mailing list?

DSpace happens to use Lucene (actually in two independent forms -
Lucene and Solr), but from the point of view of a search engine, that
doesn't matter. You should access its data via network protocols.

Now you have two options, depending on what you want to build:
a) if you want to build a metasearch engine (that issues a requests to
multiple systems at the same time and waits for all of them to
respond), then to connect DSpace as a source you may use its Solr
search interface [1]
b) if you want to build a central index (that harvests its source
systems periodically and then queries only its own index), then
connect to DSpace via OAI-PMH and harvest all available items [2]

Also, if you're thinking of building upon Lucene, you may want to go
with Solr or ElasticSearch from the start. They both build on top of
Lucene.

[1] http://wiki.duraspace.org/display/DSPACE/Solr
[2] https://wiki.duraspace.org/display/DSDOC3x/OAI


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] [Dspace-general] How to implements Lucene

2013-01-28 Thread helix84
On Mon, Jan 28, 2013 at 6:17 PM, Álvaro Vargas Quezada
al...@outlook.com wrote:
 Ok, let's see if I understood: I want to have a central index, that provides
 me a basic interface to search in various data sources (i.e. databases), the
 method that we want to use is the second: index periodically and consults
 its own index, I guess this way is faster.

OK

 So, that means that we are gonna use DSpace + Lucene with OAI
 protocol, isn't?

As I wrote, this part is confusing. For your central index you don't
need DSpace at all, DSpace is just one of the sources that can be
connected. Right? Just confirming that we understand each other. If
you think you need DSpace for your central index to work, then we
don't understand each other and you need to explain more about what
you want to achieve.

 You recomemend me to use OAI
 protocol, I agree but, is this possible with other databases like SQL and
 Oracle?

So, the architecture would be your central index harvesting from
multiple sources, e.g.:

Central index:
* your own Lucene-based program

Sources
* DSpace (via OAI-PMH)
* Oracle (via SQL)
* some other source, e.g. via Solr
* yet another source, e.g. via CSV import
* ...

So SQL could be just another harvesting protocol (e.g. SELECT title,
author, keywords FROM items).


And I wanted to write one more important information in my previous
email, but I forgot. Maybe you don't need to start from scratch. Take
a look at RCAAP [1] [2] and if you think you want something like that,
maybe you don't have to start from scratch. Contact João Melo in that
case.

[1] http://www.rcaap.pt
[2] http://www.rcaap.pt/help.jsp


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] [Dspace-general] How to implements Lucene

2013-01-28 Thread Álvaro Vargas Quezada
Perfect, I was confused about the use of DSpace, you're right, we don't need 
DSpace at all, with Lucene and those protocols will be enough, in the end, 
Again, thank you very much for the information helix84, i will investigate 
RCAAP.
Greets!

 From: heli...@centrum.sk
 Date: Mon, 28 Jan 2013 18:50:04 +0100
 Subject: Re: [Dspace-general] How to implements Lucene
 To: al...@outlook.com
 CC: dspace-tech@lists.sourceforge.net; alvaro.vargas.quez...@gmail.com
 
 On Mon, Jan 28, 2013 at 6:17 PM, Álvaro Vargas Quezada
 al...@outlook.com wrote:
  Ok, let's see if I understood: I want to have a central index, that provides
  me a basic interface to search in various data sources (i.e. databases), the
  method that we want to use is the second: index periodically and consults
  its own index, I guess this way is faster.
 
 OK
 
  So, that means that we are gonna use DSpace + Lucene with OAI
  protocol, isn't?
 
 As I wrote, this part is confusing. For your central index you don't
 need DSpace at all, DSpace is just one of the sources that can be
 connected. Right? Just confirming that we understand each other. If
 you think you need DSpace for your central index to work, then we
 don't understand each other and you need to explain more about what
 you want to achieve.
 
  You recomemend me to use OAI
  protocol, I agree but, is this possible with other databases like SQL and
  Oracle?
 
 So, the architecture would be your central index harvesting from
 multiple sources, e.g.:
 
 Central index:
 * your own Lucene-based program
 
 Sources
 * DSpace (via OAI-PMH)
 * Oracle (via SQL)
 * some other source, e.g. via Solr
 * yet another source, e.g. via CSV import
 * ...
 
 So SQL could be just another harvesting protocol (e.g. SELECT title,
 author, keywords FROM items).
 
 
 And I wanted to write one more important information in my previous
 email, but I forgot. Maybe you don't need to start from scratch. Take
 a look at RCAAP [1] [2] and if you think you want something like that,
 maybe you don't have to start from scratch. Contact João Melo in that
 case.
 
 [1] http://www.rcaap.pt
 [2] http://www.rcaap.pt/help.jsp
 
 
 Regards,
 ~~helix84
 
 Compulsory reading: DSpace Mailing List Etiquette
 https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
  --
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette