Hi,
*Solution :-*
I askd this problem long before.....
Here I give u steps I followed and failed...
1. I downloaded "PDFBox" DLLs and exposed all the necessary methods. I
failed because it was giving some security exceptions. I configured trust
level to full and made some settings like AllowPartialTrustedCallers etc.
But all in vain. Exceptions still remained.
2. I tried "iTextSharp" DLLs. I got rid of security exceptions but cropped
up with another error "Resource or File not found". I chkd the path as well
but still.... System.IO exceptions..... :(
3. I tried "PDFToText.exe" . Following code was used for this :-
=======================================
string strFile = Server.MapPath(FileUpload1.PostedFile.FileName);
System.Diagnostics.Process p = new System.Diagnostics.Process();
p.StartInfo.Arguments = " -raw -htmlmeta" + " " + strFile + " " +
"E:\\PDFCont\\output.htm"; ;
p.StartInfo.FileName = Page.MapPath("pdftotext.exe");
p.StartInfo.UseShellExecute = false;
p.StartInfo.CreateNoWindow = false;
p.StartInfo.RedirectStandardOutput = false;
p.Start();
p.WaitForExit();
System.Threading.Thread.Sleep(3000);
===============================
Still stuck with System.IO exceptions.
*Working Code :-*
Finally I tried "ASPPDFLib" DLLs. It worked for me like anything..... :) . I
had to extract text from the PDF file and to generate XML. Here you are...
with working solution..... :)
===========================================
using ASPPDFLib;
protected void btnRead_Click(object sender, EventArgs e)
{
string strExtractTxt = getPDFText(@"E:\PDFCont\Tabular.pdf");
StringReader sr = new System.IO.StringReader(GenerateXML(strExtractTxt));
DataSet dsInfo = new DataSet();
dsInfo.ReadXml(sr);
}
public string getPDFText(string pstrFilePath)
{
IPdfManager objPdf = new PdfManager();
// Open a PDF file for text extraction
IPdfDocument objDoc = objPdf.OpenDocument(pstrFilePath, Missing.Value);
String strText = "";
foreach (IPdfPage objPage in objDoc.Pages)
{
strText += objPage.ExtractText(Missing.Value);
}
strText = Server.HtmlEncode(strText);
return strText.Trim();
}
public string GenerateXML(string pstrExtractTxt)
{
string[] strSplitArr = pstrExtractTxt.Split(' ');
ArrayList strSplitArrCopy = new ArrayList();
string strXML = "<Information>";
for (int x = 0; x < strSplitArr.Length; x++)
{
if (strSplitArr[x] != string.Empty) { strSplitArrCopy.Add(strSplitArr[x]); }
}
int i = 0;
for (int x = 3; x < strSplitArrCopy.Count; x++)
{
if (i == 0) strXML += "<Person>";
strXML += "<" + strSplitArrCopy[i] + ">" + strSplitArrCopy[x] + "</" +
strSplitArrCopy[i] + ">";
if (i == 2) { strXML += "</Person>"; i = 0; }
else i += 1;
}
strXML += "</Information>";
return strXML;
}
===============================================
Hope this solution may help those guys who are looking for PDF text
extraction.
On Wed, Aug 18, 2010 at 2:15 AM, Aman Sharma <[email protected]>wrote:
> I used iTextSharp as well... but got
> "* *System.IO.IOException: C:\inetpub\wwwroot\ReadPDF\Tabular.pdf not
> found as file or resource."
>
> Exception although pdf file was on the correct place....
>
> Here is my code which I used with iTextSharp...
>
>
> string
> strFile = Server.MapPath("Tabular.pdf");
>
> PdfReader
> pdfReader = new PdfReader(strFile );
>
> On PdfReader line I m getting above exception.....
> Pls let me know if u hav any idea
>
>
> On Tue, Aug 17, 2010 at 9:55 AM, alandgri <[email protected]> wrote:
>
>> iTextSharp can read from PDF files, if I recall correctly.
>>
>> On Aug 16, 12:34 am, Aman Sharma <[email protected]> wrote:
>> > Check out this site...
>> >
>> > http://www.pdf-technologies.com/demos/pdfTextExtract.aspx
>> >
>> > They have done exact thing....
>> >
>> > Use this tabular pdf for example :-
>> >
>> > *Name*
>> >
>> > *Age*
>> >
>> > *Sex*
>> >
>> > A
>> >
>> > 20
>> >
>> > M
>> >
>> > B
>> >
>> > 18
>> >
>> > F
>> >
>> > C
>> >
>> > 15
>> >
>> > M
>> >
>> > D
>> >
>> > 1
>> >
>> > F
>> >
>> > I tried PDFBox which gave me security exception....
>> >
>> > Now trying TallPDF.NET and PDFKit.NET.... lets c wheather it will be
>> doable
>> > for me... :)
>> >
>> > If anyone has any idea in PDF(Tabular Format) to text.. pls guide me.
>> >
>> > On Sun, Aug 15, 2010 at 8:35 PM, Aman Sharma <[email protected]
>> >wrote:
>> >
>> > > Data is in pdf and in a tabular form.'
>> >
>> > > I tried to use "PDFBox" open source but getting following error...
>> >
>> > > *Security Exception*
>> > > *Description: *The application attempted to perform an operation not
>> > > allowed by the security policy. To grant this application the
>> required
>> > > permission please contact your system administrator or change the
>> > > application's trust level in the configuration file.
>> >
>> > > *Exception Details: *System.Security.SecurityException: That assembly
>> does
>> > > not allow partially trusted callers.
>> >
>> > > Dont know what to do....
>> >
>> > > On Sun, Aug 15, 2010 at 5:59 PM, Stephen Russell <
>> [email protected]>wrote:
>> >
>> > >> On Sun, Aug 15, 2010 at 3:40 AM, Aman Sharma <
>> [email protected]>
>> > >> wrote:
>> > >> > Hi,
>> >
>> > >> > Is it possible in ASP.Net to read data from pdf file and show that
>> into
>> > >> Data
>> > >> > View....?
>> > >> -----------------------------
>> >
>> > >> Is the data from an excel sheet or a report?
>> >
>> > >> There are different PDF tools for VS and some can manipulate the raw
>> > >> data where others just make the content.
>> >
>> > >> --
>> > >> Stephen Russell
>> >
>> > >> Sr. Production Systems Programmer
>> > >> CIMSgts
>> >
>> > >> 901.246-0159 cell
>>
>
>