Hi Jona/Dimitris/All,

Thanks for All the help and being so patience to me.*Finally setup has been
completed successfully.*

As you suggested I am putting all the step-by-step approach here So that
after review you put is somewhere in shared/common place. Also I request
entire bdpedia developers to update it time to time at single place.It will
be very helpful for all.

"

*AbstractExtraction from dbpedia dump *

 *Software Requirement-*
Mysql
PHP with xml and apc
Scala
Maven
MediaWiki

  Steps to be followed-

 *Step 1- Download extraction framework*

Please download or pull the extraction framework using git utility.

*"git clone git://github.com/dbpedia/extraction-framework.git"*

 *
*

*Step 2-* * Download dbpedia dumps if required*
If you want to download dbpedia dump file than please do below.

*"cd dump;**../clean-install-run download download.minimal.properties**" *

There are already some configuration files to download with the repository
dump;Customize file according to your need and fire above command to
download dump.

In download configuration file you define *base-dir; *above will download
dump inside this directory in below structure.

*
/path_to_download_folder/yyyymmdd/[language_code]wiki-yyyymmdd.-pages-articles.xml.bz2
*

*NOTE-* If you have downloaded above page-article dump already
manually(without using this utility), than please skip step 2.But make sure
that the above naming convention for directory-structure name have been
followed. If not, than create this directory structure manually.

 *
*

*Step 3-* *Install Basic Software *
You need to install Mysql,PHP,apache and other software.

 To install and start the MySQL server, you can use
*dump/src/main/bash/mysql.sh
. *If you do not want to use this script that fine. Just make sure that all
the configuration parameters specified in this script have been updated
into mysql configuration file* -- my.conf. *

 Install PHP,Aapache & Mysql properly. The installation of these software
is out of scope.Please refer other proper documentations for it.

Also you need to install *php-xml* and *php-apc*. Why ? To avoid some error
and performance issues which will be described later in this document.

 *NOTE-* For some linux/unix package name *php-apc* may be *php-pecl-apc . *It
is an e-accelerator.

 I also come across with one script which may be used for this setup. I
have not tested it, but seems should work fine.
*
https://github.com/saxenap/install-php-apc-mysql-amazon-linux-centos/blob/master/php-apc-mysql-script.sh
*

 Finally Download MediaWiki from *http://www.mediawiki.org/wiki/Download . *Use
the latest stable release.(recommended). You can also use download latest
release from git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/core.git


 *Step 4-* *Trigger Import to mysql*
In order to generate clean abstracts from Wikipedia articles one needs to
render wiki templates as they would be rendered in the original Wikipedia
instance. So in order for the DBpedia Abstract Extractor to work, a running
Media Wiki instance with Wikipedia data in a MySQL database is necessary.

To import the data, you need to run the Scala 'import' launcher:

 Before importing you have to adapt the settings for the 'import' launcher
in *dump/pom.xml* as below.

"<launcher>
<id>import</id>
<mainClass>org.dbpedia.extraction.dump.sql.Import</mainClass>
<jvmArgs>
<jvmArg>-server</jvmArg>
</jvmArgs>
<args>

<arg>*path_to_download_folder*</arg>
<arg>*/path_to_wikimedia_parent_dir/mediawiki/maintenance/tables.sql*</arg>
<arg>*
jdbc:mysql://machine_name:mysql_port/?characterEncoding=UTF-8&amp;user=myuser&amp;password=mypass
*</arg>

<arg>*false*</arg><!-- require-download-complete -->

<arg>*language-code*</arg><!-- languages and article count ranges,
comma-separated -->
</args>
</launcher
"

If you have download dbpedia dump file manually than make
*require-download-complete
*as false as no file with the name exists to indicate successful download.

 Now to import data into mysql fire-
*../clean-install-run import*

 *NOTE-*
a) -- If while importing you get error *ERROR 1283: Column 'si_title'
cannot be part of FULLTEXT index* than collate should be specified for
table 'searchindex'. Hence change line for table searchindex

*) ENGINE=MyISAM*

to

*) ENGINE=MyISAM COLLATE='utf8_general_ci'; *



 *Step 5- Prepare MediaWiki -Configuration and Settings*

Modify MediaWiki for DBpedia, just copy the three files from
https://github.com/dbpedia/extraction-framework/tree/master/dump/src/main/mediawikito
appropriate directory.

In new code it is already there. Still check whether patch
https://github.com/dbpedia/extraction-framework/commit/e36913dabe0715672cbf0f2e6c5d86ec424b08b3has
been applied to ApiParse.php.

 Now download required wikimedia extensions listed at the end of *
LocalSetting.php*. For downloading extensions you may refer
http://www.mediawiki.org/wiki/Download_from_Git

 I have first downloaded all the extensions using git; than copied all
required extensions into */path_to_mediawiki_parent_dir/mediawiki/extensions
* with folder structure.

 Configure your mediawiki directory as web-directory by adding
configuration inforation into httpd.conf as below.

 “
*Alias /mediawiki /path_to_mediawiki_parent_dir/mediawiki**
<Directory /mediawiki>
**Allow from all**
</Directory>*
“


 *Step 6- Check if MediaWiki and php configuration is proper or not?*
Now fire the url at your browser*http://machine/mediawiki/api.php?uselang=en
* <http://machine/mediawiki/api.php?uselang=en>

If you get some usages instructions at you browser than mediawiki
configuration has no issue. Skip every thing below for this step and Go to
next step.

 If you are not getting usages information than resolve error by error and
fire above specified url every-time and resolve error, unless you get
usages instructions.Also keep on checking Apache error log.
Now troubleshoot the errors displayed into apache error-log. I am putting
some error & solutions which I have faced.

*---> Class 'DOMDocument' not found in LocalisationCache.php*
This error is because you have not installed php-xml module which is
specified in step 3.

 *---> if it ask you to to set $wgShowExceptionDetails = true; in
LocalSetting.php*
Simply do it. It is used to throw full debugging information

 *--->After above step you may get below error.*
*CACHE_ACCEL requested but no suitable object cache is present. You may
want to install APC.**
Backtrace:**
#0 [internal function]: ObjectCache::newAccelerator(Array)**
#1
/mnt/ebs/framework/media_wiki/wikimedia/includes/objectcache/ObjectCache.php(85):
call_user_func('ObjectCache::ne...', Array)**
#2
/mnt/ebs/framework/media_wiki/wikimedia/includes/objectcache/ObjectCache.php(72):
ObjectCache::newFromParams(Array)**
#3
/mnt/ebs/framework/media_wiki/wikimedia/includes/objectcache/ObjectCache.php(44):
ObjectCache::newFromId(3)**
#4
/mnt/ebs/framework/media_wiki/wikimedia/includes/GlobalFunctions.php(3780):
ObjectCache::getInstance(3)**
#5 /mnt/ebs/framework/media_wiki/wikimedia/includes/Setup.php(464):
wfGetMainCache()**
#6 /mnt/ebs/framework/media_wiki/wikimedia/includes/WebStart.php(157):
require_once('/mnt/ebs/framew...')**
#7 /mnt/ebs/framework/media_wiki/wikimedia/api.php(47):
require('/mnt/ebs/framew...')**
#8 {main}*

Its mean is that you have not installed *php-apc.*This is a e-accelerator
used to speed-up the process around 4-5 times.

 *If you really not want to use php-apc than please set
$wgMainCacheType=CACHE_ANYTHING** . But If will make significant impact
into performance. (Not Recommended)*


 *Step 7- Trigger the abstract export with proper setting in
abstract.properties file*

*../clean-install-run extraction extraction.abstracts.properties*
"

-- 
Regards
Gaurav Pant
+91-7709196607,+91-9405757794
------------------------------------------------------------------------------
Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete 
for recognition, cash, and the chance to get your game on Steam. 
$5K grand prize plus 10 genre and skill prizes. Submit your demo 
by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to