I am building my code using Lucene 4.7.1 and Hadoop 2.4.0 . Here is what I am
trying to do
Create Index
1. Build index in RAMDirectory based on data stored on HDFS .
2. Once built , copy the index onto HDFS.
Search Index
1. Bring in the index stored on HDFS into RAMDirectory.
2. Perform a search on in memory index .
The error I am facing is
Exception in thread "main" java.io.EOFException: read past EOF:
RAMInputStream(name=segments_2) at
org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:94)
at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:67) at
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84) at
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:326) at
org.apache.lucene.index.StandardDirectoryReader$1
o.doBody(StandardDirectoryReader.java:56) at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66) at
hdfs.SearchFiles.main(SearchFiles.java:85)
I did some research and found out this , may be due to index corruption .
Below is my code .
Save index into HDFS .
// Getting files present in memory into an
array.StringfileList[]=rdir.listAll();// Reading index files from memory and
storing them to HDFS.for(inti =0;i <fileList.length;i++){IndexInputindxfile
=rdir.openInput(fileList[i].trim(),null);longlen =indxfile.length();intlen1
=(int)len;// Reading data from file into a byte array.byte[]bytarr
=newbyte[len1];indxfile.readBytes(bytarr,0,len1);// Creating file in HDFS
directory with name same as that of// index filePathsrc =newPath(indexPath
+fileList[i].trim());dfs.createNewFile(src);// Writing data from byte array to
the file in HDFSFSDataOutputStreamfs
=dfs.create(newPath(dfs.getWorkingDirectory()+indexPath
+fileList[i].trim()),true);fs.write(bytarr);fs.flush();fs.close();}FileSystem.closeAll();
________________________________
Bringing index from HDFS into RAMDirectory and using it .
// Creating a RAMDirectory (memory) object, to be able to create index// in
memory.RAMDirectoryrdir =newRAMDirectory();// Getting the list of index files
present in the directory into an// array.FSDataInputStreamfilereader
=null;for(inti =0;i <status.length;i++){// Reading data from index files on
HDFS directory into filereader// object.filereader
=dfs.open(status[i].getPath());intsize =filereader.available();// Reading data
from file into a byte array.byte[]bytarr
=newbyte[size];filereader.read(bytarr,0,size);// Creating file in RAM directory
with names same as that of// index files present in HDFS directory.filenm
=newString(status[i].getPath().toString());StringsSplitValue
=filenm.substring(57,filenm.length());System.out.println(sSplitValue);IndexOutputindxout
=rdir.createOutput((sSplitValue),IOContext.DEFAULT);// Writing data from byte
array to the file in RAM
directoryindxout.writeBytes(bytarr,bytarr.length);indxout.flush();indxout.close();}